Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.4.0

Request #13252 Possibility to reuse a rule.
Submitted: 2008-02-28 16:10 UTC
From: clicky Assigned:
Status: Open Package: PHP_LexerGenerator (version 0.3.4)
PHP Version: Irrelevant OS: Any
Roadmaps: (Not assigned)    
Subscription  


 [2008-02-28 16:10 UTC] clicky (François Poirotte)
Description: ------------ Currently, if you have something like this in your regexp declaration block: nmstart = /[_a-zA-Z]/ nmchar = /[_a-zA-Z0-9]/ And you want to define ident using something like this: ident = @{nmstart}{nmchar}*@ You're forced to copy/paste the whole declaration of both nmstart & nmchar in ident's declaration. I suggest to allow using {name} in a regexp pattern to reuse the rule called "name" in the rule being declared. You may use this syntax anywhere in the pattern, except in character classes (that is, /[0-9{name}]/ would really match any digit, '{', 'n', 'a', 'm', 'e' or '}'). This is not part of any PCRE specification though. Therefore, it would require adding some special handling. Also, the PCRE documentation doesn't mention any use of the {name} syntax anywhere. The closest one is r{N,M} which indicates r must be repeated between N & M times, but you probably already know this. Test script: --------------- /*!lex2php %input $this->data %counter $this->N %token $this->token %value $this->value %line $this->line %matchlongest 1 nmstart = /[_a-zA-Z]/ nmchar = /[_a-zA-Z0-9]/ ident = @{nmstart}{nmchar}*@ */ Expected result: ---------------- No error. "ident" should be able to match something like: "Some_ID123". It should not match something like "5ome_invalid_ID" because '5' does not match the pattern expressed by nmstart. Actual result: -------------- Popping SUBPATTERN Popping PATTERN Popping pattern_declarations Popping processing_instructions Popping COMMENTSTART Popping PHPCODE Popping $ Exception: Unexpected input at line29: { in /usr/share/php/PHP/LexerGenerator/Regex/Lexer.php on line 202 Call Stack: 0.0002 53992 1. {main}() /usr/share/php/PHP/LexerGenerator/cli.php:0 0.0287 1587664 2. PHP_LexerGenerator->__construct() /usr/share/php/PHP/LexerGenerator/cli.php:3 0.2520 1605640 3. PHP_LexerGenerator_Parser->doParse() /usr/share/php/PHP/LexerGenerator.php:283 0.2522 1605640 4. PHP_LexerGenerator_Parser->yy_reduce() /usr/share/php/PHP/LexerGenerator/Parser.php:1855 0.2522 1605640 5. PHP_LexerGenerator_Parser->yy_r29() /usr/share/php/PHP/LexerGenerator/Parser.php:1716 0.2522 1605640 6. PHP_LexerGenerator_Parser->_validatePattern() /usr/share/php/PHP/LexerGenerator/Parser.php:1643 0.2523 1605640 7. PHP_LexerGenerator_Regex_Lexer->yylex() /usr/share/php/PHP/LexerGenerator/Parser.php:503 0.2523 1605640 8. PHP_LexerGenerator_Regex_Lexer->yylex1() /usr/share/php/PHP/LexerGenerator/Regex/Lexer.php:59

Comments

 [2008-11-14 16:12 UTC] tkli (Tom Klingenberg)
I would love to see that as well. I was on the way to get a CSS lexer ready and came accross a similar problem since the flex notation of CSS uses these placeholders a lot. couldn't it be possible to have this replaced pre-processed?
 [2009-03-17 18:15 UTC] r0vert (Trevor Sluis)
The following patch has been added/updated: Patch Name: add-references URL: patch add-references
 [2009-03-17 18:25 UTC] r0vert (Trevor Sluis)
Added patch above to use like following since it already supported this type of concatenation. alpha = /[a-zA-Z]/ alphanum = /[a-zA-Z0-9]/ name = alpha alphanum "*" resulting in name=/[a-zA-Z][a-zA-Z0-9]*/