Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.5.2

Bug #168 XML_DTD_Tree->getChildren() underscore problem
Submitted: 2003-10-30 15:22 UTC
From: aake at iki dot fi Assigned: cox
Status: Closed Package: XML_DTD
PHP Version: 4.3.3 OS: Mac OS X 10.2.8
Roadmaps: (Not assigned)    
Subscription  


 [2003-10-30 15:22 UTC] aake at iki dot fi
Description: ------------ It seems that XML_DTD_Tree->getChildren() method fails to recognize ELEMENT names (in DTD files) with underscores properly thus interpreting them as two different elements. Reproduce code: --------------- ... $dtd_parser = new XML_DTD_parser(); $dtd_tree = $dtd_parser->parse('book.dtd'); print_r($dtd_tree->getChildren('book')); ... DTD (book.dtd): <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT book (title,author?,page_number)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT page_number (#PCDATA)> <!ATTLIST book cover (hard | soft | unknown) "unknown"> Expected result: ---------------- Array ( [0] => title [1] => author [2] => page_number ) Actual result: -------------- Array ( [0] => title [1] => author [2] => page [3] => number )

Comments

 [2003-12-17 16:23 UTC] i dot veith at gmx dot de
The method "_ELEMENT" splits the Elements and if an Element is Named like "ELEMENT_NAME" The method maks two Elemets out of it ("ELEMENT" and "NAME"). And now its clear, why my xml will never be valide. What to do: In the DTD.php (from Package version 0.4.1) on Line 195 and 206 the regular expresion in incomplete. Line 195: replace $children = preg_split('/([^#a-zA-Z0-9.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); width $children = preg_split('/([^#a-zA-Z0-9_.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); line 206: replace $reg = preg_replace('/([#a-zA-Z0-9.-]+)/', '(,?\\0)', $reg); width $reg = preg_replace('/([#a-zA-Z0-9_.-]+)/', '(,?\\0)', $reg); It’s the underscore after "0-9" and before ".-". Compare: (<)D:\www\dtd\DTD.php (12078 bytes) with: (>)C:\XML_DTD-0.4.1\DTD.php (12462 bytes) 195c194 < $children = preg_split('/([^#a-zA-Z0-9_.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); --- > $children = preg_split('/([^#a-zA-Z0-9.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); 206c205 < $reg = preg_replace('/([#a-zA-Z0-9_.-]+)/', '(,?\\0)', $reg); --- > $reg = preg_replace('/([#a-zA-Z0-9.-]+)/', '(,?\\0)', $reg);
 [2004-02-08 18:59 UTC] lsmith
Patch provided by Mika Tuupola: --- DTD.php 2004-02-06 12:14:51.039983000 +0200 +++ DTD.php.new 2004-02-06 12:14:15.979992000 +0200 @@ -191,7 +191,7 @@ class XML_DTD_Parser } else { $content = null; do { - $children = preg_split('/([^#a-zA-Z0-9.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); + $children = preg_split('/([^#a-zA-Z0-9_.-]+)/', $ch, -1, PREG_SPLIT_NO_EMPTY); if (in_array('#PCDATA', $children)) { $content = '#PCDATA'; if (count($children) == 1) { @@ -202,7 +202,7 @@ class XML_DTD_Parser $this->dtd['elements'][$elem_name]['child_validation_dtd_regex'] = $ch; // Convert the DTD regex language into PCRE regex format $reg = str_replace(',', ',?', $ch); - $reg = preg_replace('/([#a-zA-Z0-9.-]+)/', '(,?\\0)', $reg); + $reg = preg_replace('/([#a-zA-Z0-9_.-]+)/', '(,?\\0)', $reg); $this->dtd['elements'][$elem_name]['child_validation_pcre_regex'] = $reg; } while (false); }
 [2004-02-18 14:29 UTC] tuupola at php dot net
This bug has been fixed in CVS. In case this was a documentation problem, the fix will show up at the end of next Sunday (CET) on pear.php.net. In case this was a pear.php.net website problem, the change will show up on the website in short time. Thank you for the report, and for helping us make PEAR better.