Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 3.7.2

Bug #17092 Problems with utf8_encode and htmlspecialchars with non-ascii chars
Submitted: 2010-02-12 19:58 UTC
From: kukulich Assigned: squiz
Status: Closed Package: PHP_CodeSniffer (version 1.2.2)
PHP Version: Irrelevant OS:
Roadmaps: (Not assigned)    
Subscription  


 [2010-02-12 19:58 UTC] kukulich (Jaroslav Hanslík)
Description: ------------ Czech language has many letters that have diacritic (see test script). Our scripts are strictly in utf-8 but because of utf8_encode and htmlspecialchars the reports are in invalid utf8. It's easy to solve the problem with htmlspecialchars. There is the third parameter: htmlspecialchars($error['message'], ENT_COMPAT, 'utf-8'); And for the second function I would advise to use iconv instead of utf8_encode and add new parameter charset (default iso-8859-1). Test script: --------------- phpcs --standard=Generic --sniffs=Generic.Commenting.Todo --report=xml test.php > checkstyle.xml <?php // TODO: P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy. Expected result: ---------------- <?xml version="1.0" encoding="UTF-8"?> <checkstyle version="1.2.2"> <file name="P:\test.php"> <error line="3" column="1" severity="warning" message="Comment refers to a TODO task "P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy"" source="Generic.Commenting.Todo"/> </file> </checkstyle> Actual result: -------------- <?xml version="1.0" encoding="UTF-8"?> <checkstyle version="1.2.2"> <file name="P:\test.php"> <error line="3" column="1" severity="warning" message="Comment refers to a TODO task "PÅ?íliÅ¡ žluÅ¥oučký kůÅ? úpÄ?l ďábelské ódy"" source="Generic.Commenting.Todo"/> </file> </checkstyle>

Comments

 [2010-08-23 08:07 UTC] squiz (Greg Sherwood)
-Assigned To: +Assigned To: squiz
Just a quick note to say that htmlspecialchars works fine for me. The real issue appears to be utf_encode doing a double encoding because the string is already utf8 encoded. The manual for htmlspecialchars say: For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets. So I don't know what the XML report would not work for you as it does not utf8 encode and produces the correct output for me.
 [2010-08-23 09:44 UTC] squiz (Greg Sherwood)
-Status: Assigned +Status: Closed
I've added a new --encoding command line argument in SVN. If you input files are already utf-8 encoded, do this: phpcs --encoding=utf-8 .... This will stop PHPCS doing the double encoding. I'm also using iconv() now so it should support many more encodings than before. In you can, please test this latest code and let me know if it works ok for you. I've tried with your sample and it is working fine for me.