Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 3.7.2

Bug #20943 Comment tokenizer does not understand national symbols in UTF-8.
Submitted: 2015-08-25 19:23 UTC
From: lexx918 Assigned: squiz
Status: Closed Package: PHP_CodeSniffer (version 2.3.3)
PHP Version: 5.5.9 OS: Ubuntu
Roadmaps: (Not assigned)    
Subscription  


 [2015-08-25 19:23 UTC] lexx918 (Alexey Shein)
Description: ------------ Compare this .. ---------- source: <?php /** * ? */ ---------- tokens: *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => <?php\r\n Process token [1]: T_DOC_COMMENT => /**\r\n·*·?\r\n·*/ *** START COMMENT TOKENIZING *** Create comment token: T_DOC_COMMENT_OPEN_TAG => /** Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n Create comment token: T_DOC_COMMENT_WHITESPACE => · Create comment token: T_DOC_COMMENT_STAR => * Create comment token: T_DOC_COMMENT_WHITESPACE => · Create comment token: T_DOC_COMMENT_STRING => ? Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n Create comment token: T_DOC_COMMENT_STRING => ·* Create comment token: T_DOC_COMMENT_CLOSE_TAG => / *** END COMMENT TOKENIZING *** Process token [2]: T_WHITESPACE => \r\n ---------- .. and this. ---------- source: <?php /** * A */ ---------- tokens: *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => <?php\r\n Process token [1]: T_DOC_COMMENT => /**\r\n·*·A\r\n·*/ *** START COMMENT TOKENIZING *** Create comment token: T_DOC_COMMENT_OPEN_TAG => /** Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n Create comment token: T_DOC_COMMENT_WHITESPACE => · Create comment token: T_DOC_COMMENT_STAR => * Create comment token: T_DOC_COMMENT_WHITESPACE => · Create comment token: T_DOC_COMMENT_STRING => A Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n Create comment token: T_DOC_COMMENT_WHITESPACE => · Create comment token: T_DOC_COMMENT_CLOSE_TAG => */ *** END COMMENT TOKENIZING *** Process token [2]: T_WHITESPACE => \r\n ---------- Pay attention to: "T_DOC_COMMENT_CLOSE_TAG => /" (In the case of the Russian letter "?") versus "T_DOC_COMMENT_CLOSE_TAG => */" (In the case of the English letter "A") Because https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Tokenizers/Comment.php does not know about the selected encoding (--encoding=utf-8). As a consequence, sniffer http://pear.php.net/package/PHP_CodeSniffer/docs/latest/PHP_CodeSniffer/Generic_Sniffs_Commenting_DocCommentSniff.html always report an error: "The close comment tag must be the only content on the line".

Comments

 [2015-08-25 21:59 UTC] lexx918 (Alexey Shein)
 [2015-11-25 09:56 UTC] squiz (Greg Sherwood)
-Status: Open +Status: Closed -Assigned To: +Assigned To: squiz
Moved to Github and resolved there.