Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.4.0

Bug #9230 Escape sequences
Submitted: 2006-11-04 00:22 UTC
From: tom at inflatablecookie dot com Assigned: cellog
Status: Closed Package: PHP_LexerGenerator (version 0.3.0)
PHP Version: 5.1.5 OS: Win XP SP2
Roadmaps: (Not assigned)    

 [2006-11-04 00:22 UTC] tom at inflatablecookie dot com (Tom Wright)
Description: ------------ The regex parser (PHP/LexerGenerator/Regex/Parser.php) doesn't handle escape sequences very well in plex file regex definitions when they come before certain other characters. The main cause of the issue is the fact that the regexes get compiled into double quoted strings rather than single quoted strings (which I can't see a need for). Characters such as " and $ and { need to be escaped locally in the compiler (which they are), however if you need a backslash before one of those, (as in /[\\"]/ inside a single quote regex) the preceding backslash gets stripped. Test script: --------------- /*!lex2php %input $this->data %counter $this->N %token $this->token %value $this->value %line $this->line delim1 = /["]/ delim2 = /[\"]/ delim3 = /[\\"]/ */ Expected result: ---------------- in /mycompiledlexer.php $yy_global_pattern = '/^(["]|[\\"]|[\\\\"])/'; Actual result: -------------- in /mycompiledlexer.php $yy_global_pattern = "/^([\"]|[\"]|[\\\\\"])/";


 [2006-11-04 00:44 UTC] tom at inflatablecookie dot com
This problem also affects octal ranges, but strangely in the opposite manner, and only on the right hand side of the range.. /[a-z_\x7f-\xff]/ will produce [a-z_\x7f-xff]/ in the compiled lexer.
 [2006-12-16 00:16 UTC] cellog (Greg Beaver)
you need double-quoted strings for things like \t and \n to have any meaning.
 [2006-12-16 00:57 UTC] cellog (Greg Beaver)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on by the end of next Sunday (CET). If this was a problem with the website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better. ok, there are 2 separate issues here. The first is bogus, the second is a real bug. this: ["] is the same as: [\"] in a regular expression. Try it in a single quoted or double-quoted string, and you'll see what I mean. The backslash must be escaped always as \\. the lexergenerator is correctly generated the sequences for matching the string \" by using \\\\ followed by \" (the \ is needed because we're double-quoted) to make \\\\\" The second issue with \x in character class ranges is genuine, and is fixed in cvs