Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.4.0

Bug #12564 Unicode support incomplete
Submitted: 2007-12-02 04:45 UTC
From: instance Assigned:
Status: Open Package: PHP_LexerGenerator (version CVS)
PHP Version: 5.2.4 OS: Any
Roadmaps: (Not assigned)    
Subscription  


 [2007-12-02 04:45 UTC] instance (Alan Langford)
Description: ------------ There are still some issues with processing Unicode, so although the parser code works, unit tests (i.e. LexerGeneratorTest::testLexerGeneratorUnicode()) fail.

Comments

 [2008-01-02 17:35 UTC] whitefawn (Oleg Sverdlov)
can be solved by adding /u modified for all regexps matched by preg_match() function in generated file.
 [2008-01-02 18:27 UTC] instance (Alan Langford)
Adding an already-present /u to solve a file I/O problem is a fascinating solution. Feel free to send in a patch that demonstrates this. Alternatively you could actually run the unit test and look at the actual issue.
 [2008-01-08 16:49 UTC] whitefawn (Fawn)
I am sorry. That comment is not related to your problem. I was unable to delete it. What I wanted to say is there should be a way to add /u to regexps in the generated file.
 [2008-01-09 03:05 UTC] instance (Alan Langford)
Okay, it's not well documented, but if you look at /tests/data/Unicode.plex, you'll see the new Unicode pragma: %unicode 1 This will add /u to the regexes. But the unit test fails thanks to a translation issue on file I/O. I also added %caseinsensitive 1 to make it easier to deal with case-less grammars.