Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.2.2

Bug #1884 Parser is changing ö to " ö "
Submitted: 2004-07-15 19:58 UTC
From: rm at visionthink dot de Assigned:
Status: Verified Package: XML_Beautifier
PHP Version: Irrelevant OS: Debian
Roadmaps: (Not assigned)    
Subscription  


 [2004-07-15 19:58 UTC] rm at visionthink dot de
Description: ------------ The Parser is changing schön to sch ö n. Reproduce code: --------------- <?php $vXMLBeautifier='XML/Beautifier.php'; $vXMLInhalt = '<?xml version="1.0" encoding="ISO-8859-1"?><objekttitel>schön</objekttitel>'; require_once $vXMLBeautifier; $vXMLBeautifier = new XML_Beautifier(); echo $vXMLBeautifier->formatString($vXMLInhalt); ?> Expected result: ---------------- <?xml version="1.0" encoding="ISO-8859-1"?><objekttitel>schön</objekttitel> Actual result: -------------- <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?><objekttitel>sch ö n</objekttitel>

Comments

 [2004-12-15 09:53 UTC] tuupola
Happens with other entities too: <?php require_once('XML/Beautifier.php'); $xml = new XML_Beautifier(); $str = '<?xml version="1.0" encoding="ISO-8859-1"?><foo>äöå</foo>'; print_r($xml->formatString($str)); ?> Outputs: <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?><foo>ä ö å</foo>
 [2005-03-08 05:11 UTC] doconnor
That's not a bug, that's a feature! But... I'd love an option to switch this off before processing.
 [2006-04-01 17:26 UTC] schst (Stephan Schmidt)
I no longer maintain this package => status of this bug is open.
 [2006-10-01 19:46 UTC] arnaud (Arnaud Limbourg)
Until the package gets a new maintainer.
 [2008-09-22 21:58 UTC] ashnazg (Chuck Burgess)
This seems to happen with all entities I've looked at today (see Bug #2144). I suppose the question is whether or not entities are supposed to remain entities in the output, or be converted into their literals. I'll need to study the docs to be sure.
 [2009-06-01 23:39 UTC] ashnazg (Chuck Burgess)
-Summary: Parser is changing ö to " ö " +Summary: Parser is changing ö to " ö "
 [2010-07-14 21:17 UTC] nayru (Felix Weßendorf)
The bug goes back to the PHP function xml_parse which creates two CDATA-Tokens out of a string like "schön". The first token with data "sch" and the second with "ön". Good news is: I've fixed the bug by adding some code to the Tokenizer.php within the function handler for character data. My fix looks up the previously parsed cdata sibling (if there is one) and simply appends the character data to it without creating a new node in the parse tree. Check it out: diff for Tokenizer.php ------------------------- 202a203,215 > $parent = array_pop($this->_struct); > if(count($parent["children"])){ > $sibling = array_pop($parent["children"]); > if($sibling["type"]==XML_BEAUTIFIER_CDATA){ > $sibling["data"].=$cdata; > array_push($parent["children"], $sibling); > array_push($this->_struct, $parent); > return true; > } > array_push($parent["children"], $sibling); > } > array_push($this->_struct, $parent); >
 [2010-07-14 21:30 UTC] nayru (Felix Weßendorf)
 [2011-09-28 01:24 UTC] ashnazg (Chuck Burgess)
-Roadmap Versions: 1.2.1 +Roadmap Versions:
 [2012-01-19 20:45 UTC] ptitdemon57 (Ptit Demon 57)
This bu is probably related to xml_parser. Try to put your string in <![CDATA[ ]]> to workaround it. <objekttitel><![CDATA[ schön ]]></objekttitel>