Writing a New Converter

Overview of how to write a new Converter

Authors:
by Joshua Eichorn
jeichorn@phpdoc.org
by Gregory Beaver
cellog@sourceforge.com

Introduction to Converters

This documentation deals only with the advanced programming topic of creating a new output converter. To learn how to use phpDocumentor, read the phpDocumentor Guide to Creating Fantastic Documentation. To learn how to extend an existing Converter, read Converter Manual

Basic Concepts

Abstract Parsing Data

Source Code Elements

A Converter's job is to take abstract data from parsing and create documentation. What is the abstract data? phpDocumentor is capable of documenting tutorials, and a php file and its re-usable or structurally important contents. In other words, phpDocumentor can document:

A XML DocBook-based tutorial (see phpDocumentor Tutorials)
Procedural Page (PHP source file)
Include Statements
Define Statements
Global Variables
Functions
Classes
Class Variables
Class Methods

phpDocumentor represents these PHP elements using classes:

This relationship between the source code and abstract data is quite clear, and very simple. The data members of each abstract representation correspond with the information needed to display the documentation. All PHP elements contain a DocBlock, and this is represented by a class as well, parserDocBlock. The elements of a DocBlock are simple:

Short Description, containing any number of inline tags
Long Description, containing any number of inline tags
Tags, some containing any number of inline tags in their general description field

phpDocumentor represents these elements using classes as well:

parserDesc for both short and long descriptions
parserInlineTag for inline tags
parserTag for regular tags

HTML-specific issues

There are some other issues that Converters solve. In HTML, a link is represented by an <a> tag, but in the PDF Converter, it is represented by a <c:ilink> tag. How can we handle both cases? Through another abstract class, the abstractLink and its descendants:

Note the absence of an "includeLink" class - this is intentional, as only re-usable elements need to be linked. An include statement is always attached to the file that it is in, and cannot be anywhere else.

These abstract linking classes contain the information necessary to differentiate between the location of any of the element's documentation. They are only used to allow linking to documentation, and not to source code. A link is then converted to the appropriate text representation for the output format by Converter::returnSee() (a href=link for html, c:ilink:link for pdf, link linkend=link in xml:docbook, and so on).

The other issues solved by a converter involves html in a DocBlock. To allow better documentation, html is allowed in DocBlocks to format the output. Unfortunately, this complicates output in other formats. To solve this issue, phpDocumentor also parses out all allowed HTML (see phpDocumentor Tutorial for more detailed information) into abstract structures:

-- emphasize/bold text
-- hard line break, may be ignored by some converters
<code> -- Use this to surround php code, some converters will highlight it
-- italicize/mark as important
<li> -- list item
<ol> -- ordered list
-- If used to enclose all paragraphs, otherwise it will be considered text
<pre> -- Preserve line breaks and spacing, and assume all tags are text (like XML's CDATA)
<ul> -- unordered list

is mapped to classes:

parserB
parserBr
parserCode
parserI
parserList - both types of lists are represented by this object, and each <li> is represented by an array item
parserPre

is represented by partitioning text into an array, where each array item is a new paragraph. (as in parserDocBlock::$processed_desc)

With these simple structures and a few methods to handle them, the process of writing a new converter is straightforward.

Separation of data from display formatting

phpDocumentor has been designed to keep as much formatting out of the source code as possible. For many converters, there need be no new code written to support the conversion, as all output-specific information can be placed in template files. However, the complexity of generating class trees does require the insertion of some code into the source, so at the bare minimum, the getRootTree() method must be overridden.

Methods that must be overridden

Creating a new converter can be challenging, but should not be too complicated. You need to override one data structure, Converter::$leftindex, to tell the Converter which of the individual element indexes your Converter will use. You will also need to override a few methods for the Converter to work. The most important relate to linking and output.

A Converter must override these core methods:

Convert() - take any descendant of parserElement or a parserPackagePage and convert it into output
returnSee() - takes a abstract link and returns a string that links to an element's documentation
returnLink() - takes a URL and text to display and returns an internet-enabled link
Output() - generate output, or perform other cleanup activities
Convert_RIC() - Converts README/INSTALL/CHANGELOG file contents for inclusion in documentation
ConvertErrorLog() - formats errors and warnings from $phpDocumentor_errors. see HTMLframesConverter::ConvertErrorLog()
getFunctionLink() - for all of the functions below, see HTMLframesConverter::getFunctionLink() for an example
getClassLink()
getDefineLink()
getGlobalLink()
getMethodLink()
getVarLink()

A Converter may optionally implement these abstract methods:

endPage() - do any post-processing of procedural page elements, possibly output documentation for the page
endClass() - do any post-processing of class elements, possibly output documentation for the class
formatIndex() - format the $elements array into an index see HTMLframesConverter::generateElementIndex()
formatPkgIndex() - format the $pkg_elements array into an index see HTMLframesConverter::generatePkgElementIndex()
formatLeftIndex() - format the $elements array into an index see HTMLframesConverter::formatLeftIndex()
formatTutorialTOC() - format the output of a {@toc} tag, see HTMLframesConverter::formatTutorialTOC()
getRootTree() - generates an output-specific tree of class inheritance
SmartyInit() - initialize a Smarty template object
writeSource() - write out highlighted source code for a parsed file
writeExample() - write out highlighted source code for an example
unmangle() - do any post-processing of class elements, possibly output documentation for the class

The following methods may need to be overridden for proper functionality, and are for advanced situations.

checkState() - used by the parserStringWithInlineTags::Convert() cache to determine whether a cache hit or miss occurs
getState() - used by the parserStringWithInlineTags::Convert() cache to save state for the next Convert() call
type_adjust() - used to enclose type names in proper tags as in XMLDocBookConverter::type_adjust()
postProcess() - called on all converted text, so that any illegal characters may be escaped. The HTML Converters, for example, pass all output through http://www.php.net/htmlentities
getTutorialId() - called by the {@id} inline tag to get the Converter's way of representing a document anchor

Converter methods an extended converter should use

getSortedClassTreeFromClass() -- generating class trees by package
hasTutorial() -- use this to retrieve a tutorial, or determine if it exists
getTutorialTree() -- use this to retrieve a hierarchical tree of tutorials that can be used to generate a table of contents for all tutorials
vardump_tree() -- use this to assist in debugging large tree structures of tutorials