Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.2.3

Bug #7320 Wikilinks messed when using international characters
Submitted: 2006-04-05 17:39 UTC
From: martin dot ottenwaelter at ensimag dot fr Assigned: justinpatrin
Status: Closed Package: Text_Wiki (version 1.1.0)
PHP Version: 4.4.1 OS: Irrelevant
Roadmaps: (Not assigned)    
Subscription  


 [2006-04-05 17:39 UTC] martin dot ottenwaelter at ensimag dot fr (Martin Ottenwaelter)
Description: ------------ ** Problem : The regular expression used to match wikilinks has bugs when dealing with international characters (é, è, ü, etc...). For example, the french word "Création" is mis-interpreted as a wikilink : Text_Wiki will produce the following HTML sequence wherever the word "Création" appears. <a href="?wikiword=Cr%C3">Cr?</a>©ation On the other hand, the word "Creation" is not interpreted as a wikilink. ** Solution : Use regular expressions UTF-8 capabilities : 1. instead of using the \xc0-\xfe range for international characters, use the \pL, \p{Lu} and \p{Ll} for "any letter", "any uppercase letter" and "any lower case letter" 2. add a "u" at the end of the regex to enable UTF-8 mode : "/my_regex/u". I have successfully patched "Wikilink.php" using this method and it works just fine. Sources : http://www.php.net/manual/en/ reference.pcre.pattern.modifiers.php http://www.php.net/manual/en/ reference.pcre.pattern.syntax.php

Comments

 [2006-04-05 18:05 UTC] toggg (bertrand Gugger)
You need to give back what settings you have and a minimal test. We are very interested in unicode reactions for Text_Wiki. Is it same with a php-5.1.3 from cvs ? Il faudrait que tu donnes ton paramètrage et que tu fournisses un test minimal. Les réactions d'unicode nous intéressent beaucoup. Pareil avec un php-5.1.3 du cvs ?
 [2006-04-05 18:36 UTC] martin dot ottenwaelter at ensimag dot fr
Here goes an example : The following php code : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> </head> <body> <?php require_once 'PEAR.php'; require_once 'Text/Wiki.php'; $wiki = & Text_Wiki::singleton('Default'); // UTF8 $wiki->setFormatConf('Xhtml', 'charset', 'UTF-8'); // Extended character set (é à ù, etc...) $wiki->setParseConf('wikilink', 'ext_chars', 'false'); $result = $wiki->transform("Création", 'Xhtml'); print($result); ?> </body> </html> outputs the follwing HTML <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> </head> <body> <p>Cr?<a href="http://example.com/new.php?page=Cr%C3">?</ a>©ation</p> </body> </html> instead of ... <p>Création</p> ... , with PHP 4.4.1 and the file encoding being UTF8. Haven't had time to test it on 5.1.3, but my patch makes it work on 4.4.1 ! Martin.
 [2006-04-10 04:59 UTC] toggg (bertrand Gugger)
Thanks for the patch, but we don't read such instructions. As said , we would really like to support utf8 fully. We are no machine , we use them. Please , present your patch as a diff -u And we'll test it to check it does not affect our country settings.
 [2006-12-07 06:58 UTC] justinpatrin (Justin Patrin)
I've attempted to fix this in CVS. I would greatly appreciate tests with UTF-8 and non-UTF-8. The docs say that this is for UTF-8 mode. I'm not sure how you get into or out of UTF-8 mode so I don't know how to test this. A test case would be very much appreciated (please put it up in a file I can download). This checkin may break either way so any tests would be good.
 [2006-12-08 08:24 UTC] justinpatrin (Justin Patrin)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on pear.php.net by the end of next Sunday (CET). If this was a problem with the pear.php.net website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better. You can set the utf-8 option to true for the Wikilink and Freelink rules to fix this behavior. If it still doesn't work please reopen this bug.