Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.0.5

Bug #12916 German umlauts are displayed wrong
Submitted: 2008-01-16 20:19 UTC
From: sunfish Assigned:
Status: Analyzed Package: XML_Feed_Parser (version 1.0.2)
PHP Version: 5.2.3 OS:
Roadmaps: (Not assigned)    
Subscription  


 [2008-01-16 20:19 UTC] sunfish (Ulrich Fischer)
Description: ------------ German umlauts (ÄÖÜäöüß) from an atom xml file (encoding="iso-8859-1") are displayed wrong after parsing, eg. http://www.keine-gentechnik.de/news-regionen.xml Test script: --------------- This solution works for me Type.php, Line 352 // remove htmlentities, because they prevent utf8_decode conversion [$content .= htmlentities($node->nodeValue);] $content .= $node->nodeValue; Atom.php, line 263 // add utf8_decode [return $content->nodeValue;] return utf8_decode($content->nodeValue);

Comments

 [2008-01-17 10:57 UTC] sunfish (Ulrich Fischer)
Oh, the following seems to be better: Test script: --------------- Type.php, Line 352 // first: utf8_decode, then: htmlentities, because htmlentities prevent utf8_decode later conversion [$content .= htmlentities($node->nodeValue);] $content .= htmlentities(utf8_decode($node->nodeValue));
 [2008-01-17 12:01 UTC] sunfish (Ulrich Fischer)
I missed an real test script. Test script: --------------- $xml_source = ' ÄÜÖäüöß
ÄÜÖäüöß
</content> </entry> </feed> '; Expected result: ---------------- <title>ÄÜÖäüöß</title> <content>ÄÜÖäüöß</content> Actual result: -------------- <title>ÄýÖäüöþ</title> <content>ÄýÖäüöþ </content>
 [2008-03-08 19:47 UTC] jystewart (James Stewart)
Those changes don't seem to do the trick when I test this. I've actually refactored that part of the code slightly. Could you take another look and see if changes along these lines still work for you?
 [2008-04-02 19:32 UTC] mortencb (Morten-Christian Bernson)
I have the exact same problem with norwegian special characters: æÆøØåÅ. I tried the fix sunfish proposed, as well as updating to the files in CVS. Neither helped. Would be very good to get this fixed, as I can't use this library now :(
 [2008-06-03 15:36 UTC] jystewart (James Stewart)
I've continued to work on this as time allows but have yet to come up with a solution that doesn't introduce regressions. The solution proposed in this thread is causing bugs in some of the other handling. My time to work on this is very limited, so if anyone has any patches to offer then that would definitely speed things up.
 [2008-06-12 11:55 UTC] mortencb (Morten-Christian Bernson)
I have temporarily fixed it for our norwegian characters (and some others) by doing this on the output text: function norskeTegn($gurba) { $gurba = str_replace("æ","æ",$gurba); $gurba = str_replace("ø","ø",$gurba); $gurba = str_replace("Ã¥","å",$gurba); $gurba = str_replace("Ã\206","Æ",$gurba); $gurba = str_replace("Ã\230","Ø",$gurba); $gurba = str_replace("Ã\205","Å",$gurba); $gurba = str_replace("â\200\223","-",$gurba); $gurba = str_replace("ö","ö",$gurba); $gurba = str_replace("«","«",$gurba); $gurba = str_replace("»","»",$gurba); return $gurba; }
 [2008-12-15 19:03 UTC] herringm (Michael Herring)
In content elements of type 'xhtml', Type.php defines processEntitiesForNodeValue() which is used to take care of entities within these 'xhtml' type elements only (it is NOT used for text or any other types). This function doesn't work properly because it calls iconv or utf8_encode on the input string (provided it is not UTF-8 to begin with) and then handles entitized characters with html_entity_decode() and htmlentities(). This has been fixed by handling entitized characters with html_entity_decode() and htmlentities() prior to the iconv or utf8_encode on the input string. The encoding of the final rendered page must also be utf-8 for these characters to be properly displayed.
 [2009-05-23 23:59 UTC] doconnor (Daniel O'Connor)
Thanks for the patch Michael! Given my unfamiliarity with the package, I've only had a cursory glance, and it LGTM. I don't suppose anyone in this thread would be interested in adopting this package?
 [2009-05-23 23:59 UTC] doconnor (Daniel O'Connor)
-Status: Open +Status: Analyzed
 [2011-12-09 15:50 UTC] doconnor (Daniel O'Connor)
clockwerx@clockwerx-desktop:~/XML_Feed_Parser$ patch -p1 < patch- download.php\?id\=12916\&patch\=entities\&revision\=1229367624 can't find file to patch at input line 3 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |--- Type.php 2008-12-15 13:16:57.000000000 -0500 |+++ /usr/local/php5/lib/php/XML/Feed/Parser/Type.php 2008-12-15 13:17:22.000000000 -0500 -------------------------- File to patch: XML/Feed/Parser/Type.php patching file XML/Feed/Parser/Type.php Hunk #1 FAILED at 333. 1 out of 1 hunk FAILED -- saving rejects to file XML/Feed/Parser/Type.php.rej
 [2011-12-09 17:52 UTC] doconnor (Daniel O'Connor)
http://test.pear.php.net:8080/job/XML_Feed_Parser/40/testReport/junit/(root)/R egressions/test_handlesGermanUmlauts/ After applying your fixes; I think
 [2011-12-09 18:01 UTC] doconnor (Daniel O'Connor)
Scratch that; no changes since at least build 16 in behaviour - so your patch didn't cause that test to fail
 [2011-12-09 19:30 UTC] doconnor (Daniel O'Connor)
(also may not have fixed anything!)