Bug #12564 :: Unicode support incomplete

Package home | Report new bug | New search | Development Roadmap

Status: Open | Feedback | All | Closed Since Version 0.4.0

Bug #12564	Unicode support incomplete
Submitted:	2007-12-02 04:45 UTC
From:	instance	Assigned:
Status:	Open	Package:	PHP_LexerGenerator (version CVS)
PHP Version:	5.2.4	OS:	Any
Roadmaps:	(Not assigned)
Subscription	Your email:

Comments Add Comment Add patch

Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know! Just going to say 'Me too!'? Don't clutter the database with that please !

Your email address: MUST BE VALID
Solve the problem : 12 - 11 = ?

[2007-12-02 04:45 UTC] instance (Alan Langford)

Description:
------------
There are still some issues with processing Unicode, so although the parser code works, unit tests (i.e. LexerGeneratorTest::testLexerGeneratorUnicode()) fail.

Comments

[2008-01-02 17:35 UTC] whitefawn (Oleg Sverdlov)

can be solved by adding /u modified for all regexps matched by preg_match() function in generated file.

[2008-01-02 18:27 UTC] instance (Alan Langford)

Adding an already-present /u to solve a file I/O problem is a fascinating solution. Feel free to send in a patch that demonstrates this.

Alternatively you could actually run the unit test and look at the actual issue.

[2008-01-08 16:49 UTC] whitefawn (Fawn)

I am sorry. That comment is not related to your problem. I was unable to delete it. What I wanted to say is there should be a way to add /u to regexps in the generated file.

[2008-01-09 03:05 UTC] instance (Alan Langford)

Okay, it's not well documented, but if you look at  /tests/data/Unicode.plex, you'll see the new Unicode pragma:

%unicode 1

This will add /u to the regexes. But the unit test fails thanks to a translation issue on file I/O.

I also added

%caseinsensitive 1

to make it easier to deal with case-less grammars.