Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.2.2

Bug #5450 Parser strip many tags
Submitted: 2005-09-19 00:42 UTC Modified: 2011-09-27 20:24 UTC
From: jonysk at gmail dot com Assigned:
Status: Verified Package: XML_Beautifier
PHP Version: 5.0.3 OS: Debian Linux
Roadmaps: (Not assigned)    
Subscription  


 [2005-09-19 00:42 UTC] jonysk at gmail dot com
Description: ------------ This class is very useful for indent codes... But, unfortunately, many relevant tags are stripped, such as CDATA sections and DOCTYPE definition. This make the class useless for complex codes that have those features... Test script: --------------- A little example: $string = "<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE bookmark SYSTEM \"bookmark.dtd\"><bookmark><category><![CDATA[ this cdata will be stripped ]]></category></bookmark>"; $xml_b = new XML_Beautifier(); if (Error::isError($a = $xml_b->formatString($string))) { echo $a->getMessage(); } else { echo $a; } Expected result: ---------------- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE bookmark SYSTEM "bookmark.dtd"> <bookmark> <category> <![CDATA[ this cdata will be stripped ]]> </category> </bookmark> Actual result: -------------- <bookmark> <category> this cdata will be stripped </category> </bookmark>

Comments

 [2006-10-01 14:47 UTC] arnaud (Arnaud Limbourg)
Until the package gets a new maintainer.
 [2008-09-22 14:47 UTC] ashnazg (Chuck Burgess)
Confirmed this bug, using v1.2.0 on PHP 5.2.4. At first, when seeing the XML declaration and the DOCTYPE tags being completely ignored in the output, I thought that might have been intended package behavior. However, the example in the manual [1] indicates both tags should be included in the output. [1] - http://pear.php.net/manual/en/package.xml.xml-beautifier.example.php
 [2008-09-22 15:33 UTC] ashnazg (Chuck Burgess)
Interestingly, running on PHP 4.4.9 shows the XML, DOCTYPE, and CDATA sections are not lost. However, the formatting result doesn't look quite clean: <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><!DOCTYPE bookmark SYSTEM "bookmark.dtd"><bookmark> <category> <![CDATA[ this cdata will be stripped ]]> </category> </bookmark> Problems: - the XML, DOCTYPE, and root tag are all in the same line; - the CDATA opening tag is not indented at all; - there's a large space gap between the CDATA opening and the first CDATA text character; - the CDATA end tag is on a newline, with no indention; - the closing category tag is on the same line with the CDATA closing tag. So, the premise of this bug report seems to only apply to PHP5.
 [2009-08-21 04:30 UTC] marc_c (Marc Christenfeldt)
The problem seems to be the php xml parser (http://us2.php.net/manual/en/book.xml.php). In XML_Parser::_initHandlers() there are handlers registered via xml_set_processing_instruction_handler and xml_set_default_handler, but they are not called then as expected. XML_Beautifier_Tokenizer::piHandler() is not called for <?xml ... ?>, but for other processing instruction tags, like <?abc ?>. XML_Beautifier_Tokenizer::defaultHandler() is not called for the doctype-declaration, hence it is missed after beautification. But for comments (<!--(.+)-->) it is called. I think this method should also be called for cdata since there is same handling of it in the defaultHandler (see _handleXMLDefault), but it is not called. I have no idea how to fix this issue. Perhaps this information helps somebody else.
 [2011-09-27 20:24 UTC] ashnazg (Chuck Burgess)
-Roadmap Versions: 1.2.1 +Roadmap Versions: