Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.2.3

Bug #5392 encoding of ISO-8859-1 is the only supported encoding
Submitted: 2005-09-13 22:31 UTC Modified: 2008-05-22 02:47 UTC
From: cellog Assigned: ashnazg
Status: Closed Package: XML_Util
PHP Version: Irrelevant OS: n/a
Roadmaps: 1.2.0a2    
Subscription  


 [2005-09-13 22:31 UTC] cellog
Description: ------------ Due to the use of htmlentities() with no encoding in replaceEntities(), there is no way to properly encode utf8. There are several ways of fixing this. The simplest is to add an encoding option to replaceEntities() and to either use iconv() or utf8_encode() if iconv() isn't there. Otherwise, it would make sense to make XML_Util non-static, as I originally suggested when it was proposed, so that you can set the encoding once, and all actions will be automatically affected. Too bad this would mean XML_Util2.

Comments

 [2005-09-23 06:57 UTC] schst
Could you specify, when the problem arises? This should work fine: <?php require_once 'XML/Util.php'; $data = 'This data contains special chars like <, >, & and " as well as ä, ö, ß, à and ê'; echo XML_Util::replaceEntities($data); echo "\n\n"; echo XML_Util::replaceEntities(utf8_encode($data)); echo "\n\n"; ?> But maybe I'm totally wrong here...
 [2005-09-23 10:50 UTC] cellog
The problem occurs when generating a file via XML_Serializer, or in PEAR's case, our customized version of such. There is no built-in way to define the encoding, and replaceEntities() is buried deep within the serialization process.
 [2008-05-04 13:17 UTC] ashnazg (Chuck Burgess)
Built a test phpt file using schst's script. This test, using XML_Util 1.1.4 on PHP 5.2.4, does not produce identical results: This data contains special chars like <, >, & and " as well as ä, ö, ß, à and ê This data contains special chars like <, >, & and " as well as ä, ö, Ã<9f>, à and ê
 [2008-05-05 21:11 UTC] ashnazg (Chuck Burgess)
I think I was able to solve this using the optional encoding option in htmlentities(), in order to allow you to pass an optional encoding string to replaceEntities(). I also applied the same logic with html_entity_decode() to reverseEntities(). Greg, is there any chance you can review this patch to see if it solves your original issue? This is my first foray into encoding, but it looks to me like it's doing what you need it to do. I'll add patches for the three PHPTs shortly, so you can also see my test rationale.
 [2008-05-05 21:30 UTC] ashnazg (Chuck Burgess)
Updated the patch on Util.php via "diff -u".
 [2008-05-05 21:32 UTC] ashnazg (Chuck Burgess)
attached patches for the three test files that I updated while doing this development.
 [2008-05-22 02:47 UTC] ashnazg (Chuck Burgess)
Committed to CVS.