Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.4.2

Request #6411 Various fixes to regexes
Submitted: 2006-01-04 06:49 UTC
From: alan_k Assigned:
Status: Feedback Package: File_Gettext
PHP Version: Irrelevant OS: all
Roadmaps: (Not assigned)    
Subscription  


 [2006-01-04 06:49 UTC] alan_k
Description: ------------ diff -pur File/Gettext/PO.php File.new/Gettext/PO.php --- File/Gettext/PO.php 2005-12-30 16:36:45.000000000 +0800 +++ File.new/Gettext/PO.php 2005-12-30 16:35:35.000000000 +0800 @@ -64,8 +64,8 @@ class File_Gettext_PO extends File_Gette // match all msgid/msgstr entries $matched = preg_match_all( - '/(msgid\s+("([^"]|\\\\")*?"\s*)+)\s+' . - '(msgstr\s+("([^"]|\\\\")*?"\s*)+)/', + '/msgid\s+((?:".*(?<!\\\\)"\s*)+)\s+' . + 'msgstr\s+((?:".*(?<!\\\\)"\s*)+)/', $contents, $matches ); unset($contents); @@ -76,10 +76,8 @@ class File_Gettext_PO extends File_Gette // get all msgids and msgtrs for ($i = 0; $i < $matched; $i++) { - $msgid = preg_replace( - '/\s*msgid\s*"(.*)"\s*/s', '\\1', $matches[1][$i]); - $msgstr= preg_replace( - '/\s*msgstr\s*"(.*)"\s*/s', '\\1', $matches[4][$i]); + $msgid = substr(trim($matches[1][$i]), 1, -1); + $msgstr = substr(trim($matches[2][$i]), 1, -1); $this->strings[parent::prepare($msgid)] = parent::prepare($msgstr); } diff -pur File/Gettext.php File.new/Gettext.php --- File/Gettext.php 2005-12-30 16:36:45.000000000 +0800 +++ File.new/Gettext.php 2005-12-30 16:35:41.000000000 +0800 @@ -131,12 +131,12 @@ class File_Gettext function prepare($string, $reverse = false) { if ($reverse) { - $smap = array('"', "\n", "\t", "\r"); - $rmap = array('\\"', '\\n"' . "\n" . '"', '\\t', '\\r'); + $smap = array('\\', '"', "\n", "\t", "\r"); + $rmap = array('\\\\', '\\"', '\\n"' . "\n" . '"', '\\t', '\\r'); return (string) str_replace($smap, $rmap, $string); } else { - $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/', '/\\\\"/'); - $rmap = array('', "\n", "\r", "\t", '"'); + $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/', '/\\\\"/', '/\\\\\\\\/'); + $rmap = array('', "\n", "\r", "\t", '"', '\\'); return (string) preg_replace($smap, $rmap, $string); } }

Comments

 [2006-01-06 09:39 UTC] ivanwyc at gmail dot com
To give more details for this bug: - the original regex for msgid and msgstr doesn't really work for things like this: msgid "AAA" "BBB\"CCC" as the '\' of '\"' always take the first of the alternation ([^"]|\\\\"), the second term has no effect. - Swapping the alternation (\\\\"|[^"]) should work theoretically, but practically it segfaults for long text, refer to [1]. Also the regex we propose is the fastest as suggested in [1]. - prepare() didn't escape the \ character as well: [1] http://www.gossamer-threads.com/lists/perl/porters/199811
 [2006-01-07 09:48 UTC] mike
Please make the patch available online. Thanks a lot.
 [2006-01-08 08:48 UTC] ivanwyc at gmail dot com
 [2011-06-03 19:23 UTC] looksup (François Poirotte)
However, the pattern you suggest will now fail for an escaped backslash followed by an escaped quote due to the negative look-behind So, something like this will not match: msgid "AAA" "BBB\\\"CCC" I don't think using a regex-based parser here is a good choice, because there is a lot more to the PO format than the mere msgid & msgstr and taking all of that into account using regular expressions may prove quite difficult. See also http://download.oracle.com/docs/cd/E19683-01/817-0659/6mgeo5s1u/index.html that lists other possible directives and has information about the meaning of special comments too.
 [2012-01-02 07:19 UTC] doconnor (Daniel O'Connor)
clockwerx@clockwerx-desktop:~/pear-svn-git/File_Gettext$ patch -p1 < patch- download.php\?id\=6411\&patch\=File_Gettext.patch\&revision\=1176784721 patching file Gettext/PO.php Hunk #1 FAILED at 64. Hunk #2 FAILED at 76. 2 out of 2 hunks FAILED -- saving rejects to file Gettext/PO.php.rej patching file Gettext.php Hunk #1 FAILED at 131. 1 out of 1 hunk FAILED -- saving rejects to file Gettext.php.rej clockwerx@clockwerx-desktop:~/pear-svn-git/File_Gettext$ ... so the patch needs more work anyway. This code is on github, so it';s really easy to either re-write the regexes or send in a pull request foir a whole new parsing mechanism.
 [2012-01-02 07:19 UTC] doconnor (Daniel O'Connor)
-Status: Open +Status: Feedback
Need new patch