Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.4.0

Request #13221 Unescaped hyphen can't be used to express itself in character classes
Submitted: 2008-02-26 22:07 UTC
From: clicky Assigned:
Status: Open Package: PHP_LexerGenerator (version 0.3.4 and CVS)
PHP Version: Irrelevant OS: Irrelevant
Roadmaps: (Not assigned)    
Subscription  
Comments Add Comment Add patch


Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know! Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem : 27 + 40 = ?

 
 [2008-02-26 22:07 UTC] clicky (François Poirotte)
Description: ------------ If you want to use the hyphen character (-) in a character class, you need to escape it, eg.: use octal (\055) or hexadecimal (\x2D) notation. I think the regexp lexer/parser should be made a bit more permissive as, for example, [-a-zA-Z0-9] is a valid character class. I think this is more of a feature request than a real bug report though... Test script: --------------- <?php class TestLexer { // ... /*!lex2php %input $this->data %counter $this->N %token $this->token %value $this->value %line $this->line %matchlongest 1 space = /[ \t\n]+/ name = /[-a-zA-Z0-9]+/ */ // Please note that using name = /[\x2Da-zA-Z0-9]+/ // works as expected. /*!lex2php %statename START name { echo "Name\n"; var_dump($this->value); echo " name subpatterns: \n"; var_dump($yy_subpatterns); } space { return FALSE; } */ // ... } ?> Expected result: ---------------- Plex should accept the file without any exception being raised. Actual result: -------------- Reduce (29) [subpattern ::= SUBPATTERN]. Syntax Error on line 27: token '-' while parsing rule:End of Input OPENCHARCLASS Popping SUBPATTERN Popping PATTERN Popping processing_instructions Popping COMMENTSTART Popping PHPCODE Popping $ Exception: Unexpected HYPHEN(-), expected one of: NEGATE,TEXT,ESCAPEDBACKSLASH,BACKREFERENCE,COULDBEBACKREF in /usr/share/php/PHP/LexerGenerator/Regex/Parser.php on line 1779 Call Stack: 0.0002 54040 1. {main}() /usr/share/php/PHP/LexerGenerator/cli.php:0 0.0260 1586716 2. PHP_LexerGenerator->__construct() /usr/share/php/PHP/LexerGenerator/cli.php:3 0.0332 1597664 3. PHP_LexerGenerator_Parser->doParse() /usr/share/php/PHP/LexerGenerator.php:283 0.0334 1598224 4. PHP_LexerGenerator_Parser->yy_reduce() /usr/share/php/PHP/LexerGenerator/Parser.php:1855 0.0334 1598224 5. PHP_LexerGenerator_Parser->yy_r29() /usr/share/php/PHP/LexerGenerator/Parser.php:1716 0.0334 1598224 6. PHP_LexerGenerator_Parser->_validatePattern() /usr/share/php/PHP/LexerGenerator/Parser.php:1643 0.0340 1602532 7. PHP_LexerGenerator_Regex_Parser->doParse() /usr/share/php/PHP/LexerGenerator/Parser.php:505 0.0341 1602636 8. PHP_LexerGenerator_Regex_Parser->yy_syntax_error() /usr/share/php/PHP/LexerGenerator/Regex/Parser.php:1878

Comments

 [2008-02-28 17:46 UTC] clicky (François Poirotte)
Ok, actually, it works if you put the hyphen at the end of the character class (before the closing ']'), which is what PCRE seems to recommend. That is, if you use [a-zA-Z0-9-] instead. Nonetheless, using it at the start of the character class is also valid (because it cannot be misunderstood with the start of a range). In fact, PHP's preg_match() allows that syntax too. So you may consider closing this bug report as bogus or wontfix, even though the issue still stands.