Request #13252 :: Possibility to reuse a rule.

Package home | Report new bug | New search | Development Roadmap

Status: Open | Feedback | All | Closed Since Version 0.4.0

Request #13252	Possibility to reuse a rule.
Submitted:	2008-02-28 16:10 UTC
From:	clicky	Assigned:
Status:	Open	Package:	PHP_LexerGenerator (version 0.3.4)
PHP Version:	Irrelevant	OS:	Any
Roadmaps:	(Not assigned)
Subscription	Your email:

Comments Patches (1) Add Comment Add patch

[2008-02-28 16:10 UTC] clicky (François Poirotte)

Description:
------------
Currently, if you have something like this in your regexp declaration block:
nmstart = /[_a-zA-Z]/
nmchar = /[_a-zA-Z0-9]/

And you want to define ident using something like this:
ident = @{nmstart}{nmchar}*@

You're forced to copy/paste the whole declaration of both nmstart & nmchar in ident's declaration.

I suggest to allow using {name} in a regexp pattern to reuse the rule called "name" in the rule being declared. You may use this syntax anywhere in the pattern, except in character classes (that is, /[0-9{name}]/ would really match any digit, '{', 'n', 'a', 'm', 'e' or '}').

This is not part of any PCRE specification though. Therefore, it would require adding some special handling. Also, the PCRE documentation doesn't mention any use of the {name} syntax anywhere. The closest one is r{N,M} which indicates r must be repeated between N & M times, but you probably already know this.

Test script:
---------------
/*!lex2php
%input $this->data
%counter $this->N
%token $this->token
%value $this->value
%line $this->line
%matchlongest 1
nmstart = /[_a-zA-Z]/
nmchar = /[_a-zA-Z0-9]/
ident = @{nmstart}{nmchar}*@
*/

Expected result:
----------------
No error. "ident" should be able to match something like: "Some_ID123". It should not match something like "5ome_invalid_ID" because '5' does not match the pattern expressed by nmstart.



Actual result:
--------------
Popping SUBPATTERN
Popping PATTERN
Popping pattern_declarations
Popping processing_instructions
Popping COMMENTSTART
Popping PHPCODE
Popping $

Exception: Unexpected input at line29: { in /usr/share/php/PHP/LexerGenerator/Regex/Lexer.php on line 202

Call Stack:
    0.0002      53992   1. {main}() /usr/share/php/PHP/LexerGenerator/cli.php:0
    0.0287    1587664   2. PHP_LexerGenerator->__construct() /usr/share/php/PHP/LexerGenerator/cli.php:3
    0.2520    1605640   3. PHP_LexerGenerator_Parser->doParse() /usr/share/php/PHP/LexerGenerator.php:283
    0.2522    1605640   4. PHP_LexerGenerator_Parser->yy_reduce() /usr/share/php/PHP/LexerGenerator/Parser.php:1855
    0.2522    1605640   5. PHP_LexerGenerator_Parser->yy_r29() /usr/share/php/PHP/LexerGenerator/Parser.php:1716
    0.2522    1605640   6. PHP_LexerGenerator_Parser->_validatePattern() /usr/share/php/PHP/LexerGenerator/Parser.php:1643
    0.2523    1605640   7. PHP_LexerGenerator_Regex_Lexer->yylex() /usr/share/php/PHP/LexerGenerator/Parser.php:503
    0.2523    1605640   8. PHP_LexerGenerator_Regex_Lexer->yylex1() /usr/share/php/PHP/LexerGenerator/Regex/Lexer.php:59

Comments

[2008-11-14 16:12 UTC] tkli (Tom Klingenberg)

I would love to see that as well. I was on the way to get a CSS lexer ready and came accross a similar problem since the flex notation of CSS uses these placeholders a lot.

couldn't it be possible to have this replaced pre-processed?

[2009-03-17 18:15 UTC] r0vert (Trevor Sluis)

The following patch has been added/updated:

Patch Name:  add-references
URL:         patch add-references

[2009-03-17 18:25 UTC] r0vert (Trevor Sluis)

Added patch above to use like following since it already supported this type of concatenation.
  alpha = /[a-zA-Z]/
  alphanum = /[a-zA-Z0-9]/
  name = alpha alphanum "*"
resulting in name=/[a-zA-Z][a-zA-Z0-9]*/