Top Level :: XML

Package Information: XML_HTMLSax

This package is not maintained, if you would like to take over please go to this page.
» Summary » License
A SAX parser for HTML and other badly formed XML documents PHP
» Current Release » Bug Summary
2.1.2 (stable) was released on 2003-12-04 by hfuecks (Changelog)
Easy Install

Not sure? Get more info.

pear install XML_HTMLSax

  • Package Maintenance Rank: 211 of 225 packages with open bugs
  • Number of open bugs: 2 (6 total bugs)
  • Average age of open bugs: 6074 days
  • Oldest open bug: 6850 days
  • Number of open feature requests: 1 (1 total feature requests)

Report a new bug to XML_HTMLSax
» Description
XML_HTMLSax is a SAX based XML parser for badly formed XML documents, such as HTML.
The original code base was developed by Alexander Zhukov and published at http://sourceforge.net/projects/phpshelve/. Alexander kindly gave permission to modify the code and license for inclusion in PEAR.

PEAR::XML_HTMLSax provides an API very similar to the native PHP XML extension (http://www.php.net/xml), allowing handlers using one to be easily adapted to the other. The key difference is HTMLSax will not break on badly formed XML, allowing it to be used for parsing HTML documents. Otherwise HTMLSax supports all the handlers available from Expat except namespace and external entity handlers. Provides methods for handling XML escapes as well as JSP/ASP opening and close tags.

Version 1.x introduced an API similar to the native SAX extension but used a slow character by character approach to parsing.

Version 2.x has had it's internals completely overhauled to use a Lexer, delivering performance *approaching* that of the native XML extension, as well as a radically improved, modular design that makes adding further functionality easy.

Version 3.x is about fine tuning the API, behaviour and providing a mechanism to distinguish HTML "quirks" from badly formed HTML (later functionality not yet implemented)

A big thanks to Jeff Moore (lead developer of WACT: http://wact.sourceforge.net) who's largely responsible for new design, as well input from other members at Sitepoint's Advanced PHP forums: http://www.sitepointforums.com/showthread.php?threadid=121246.

Thanks also to Marcus Baker (lead developer of SimpleTest: http://www.lastcraft.com/simple_test.php) for sorting out the unit tests.
» Maintainers » More Information

Packages that depend on XML_HTMLSax