Docs For Class XML_Indexing

[line 97]
Transparent XML Indexing Reader

This class allows to work on big XML files without madly increasing access time. For this purpose, it creates an index, which contains informations to rapidly seek through a given XML file to retrieve a specific portion of it.

The indexing process is based on XPath expressions. Not all of the XPath language, but an appropriate subset for what a big XML files is expected to contain.

Currently, this class works transparently, creating specific indexes upon specific requests.

For example, when initially looking for /foo/bar[232], all instances from /foo/bar[1] to /foo/bar[n] will get indexed (so the first run is slow). Subsequent calls with such expressions as /foo/bar[232], /foo/bar[100], /foo/bar[25], etc... will then all make use of the created index (fast).

In addition to numerical indexes, attribute values indexing is currently supported as well. That is, expressions as /foo/bar[@id='someValue']. Similarly to the numerical indexing process, looking for a such expression will index all values of the 'id' attribute for the given XPath root (/foo/bar here).

Using this class is pretty straightforward :

 $reader = new XML_Indexing_Reader ('test.xml');

 $reader->find('/foo/bar[232]'); // Or any other XPath expression
 $xmlStrings = $reader->fetchStrings();
 
 echo "Extracted XML data : "
 foreach ($xmlStrings as $n => $str) {
     echo "######## Match $n ######### \n";
     echo "$str\n\n";
 }

Namespaces extraction is supported. These namespaces declarations are stored in the index files. You can retrieve them with :

 $reader->find(...); // Needs to be call prior to getNamespaces()

 $nsList = $reader->getNamespaces();
 foreach ($nsList as $prefix => $uri) {
     echo "$prefix => $uri";
 }

The index storage strategy can be customized by modifying the default dsn value. Currently, only local file containers are supported.

 // The following will store indexes in /tmp, using file names with an .xi

 // prefix. That is the default.
 $options['dsn'] = 'file:///tmp/%s.xi';
 $indexer = new XML_Indexing_Reader ('test.xml', $options);
 
 // You can specify your own path as long as you include the %s expression :
 $options['dsn'] = 'file:///var/cache/xi/%s.xi'
 $indexer = new XML_Indexing_Reader ('test.xml', $options);

See the constructor documentation for more information on options.

Author: Olivier Guilyardi <olivier@samalyse.com>
Version: Release: @package_version@
Copyright: 2004 Samalyse SARL corporation
Link: http://pear.php.net
Since: Class available since Release 0.1
License: PHP License

XML_Indexing_Reader (Constructor) [line 224]

XML_Indexing_Reader XML_Indexing_Reader(
string
$filename, [array
$options = array()])

Constructor

Supported options :

"dsn" : Index storage strategy, Default is to create a file in the system default temporary directory (ie: /tmp on *nix), with a '.xi' suffix. The only currently supported format is 'file://<path>'. Example : 'file:///var/cache/xi/%s.xi' Using the '%s' expression is required.
"gz_level" : Zlib compression level of the index files. 0 by default (no compression). Goes up to 9 (maximum compression, slow). Use this if you expect big indexes (many attributes, etc...)
"profiling" : takes a boolean value to enable/disable profiling support. Default is false. Enabling this option requires the Benchmark and Console_Table packages. See profile().

Access: public

Parameters:

string	`$filename`	—	The XML file to parse
array	`$options`	—	Optional custom options

[ Top ]

count [line 565]

int count(
)

Retrieves the total number of matches

Return: The number of matches
Access: public

[ Top ]

fetchDomNodes [line 620]

array fetchDomNodes(
[int
$offset = 0], [int
$limit = null])

Fetch a set of XML matches as DOM nodes

Return: DomElements
Access: public

Parameters:

int	`$offset`	—	The n match to start fetching from (zero based, default : 0)
int	`$limit`	—	How many matches to fetch (default : all)

[ Top ]

fetchStrings [line 577]

array fetchStrings(
[int
$offset = 0], [int
$limit = null])

Fetch a set of XML matches as raw strings

Return: Array of XML strings

Parameters:

int	`$offset`	—	The n match to start fetching from (zero based, default : 0)
int	`$limit`	—	How many matches to fetch (default : all)

[ Top ]

find [line 400]

bool find(
string
$xpath)

Search for an XPath expression

Return: The number of nodes matched or a PEAR_Error
Access: public

Parameters:

string $xpath — XPath expression to look for

[ Top ]

getNamespaces [line 675]

array getNamespaces(
)

Return namespaces declared in the XML file

Return: An associative array of the form ('prefix' => 'uri', ...)
Access: public

[ Top ]

profile [line 687]

void profile(
)

Output profiling informations

Access: public

[ Top ]

Class: XML_Indexing_Reader

Class Overview

Author(s):

Version:

Copyright:

Methods

Inherited Variables

Inherited Methods

Class Details

Method Detail

XML_Indexing_Reader (Constructor) [line 224]

Parameters:

count [line 565]

fetchDomNodes [line 620]

Parameters:

fetchStrings [line 577]

Parameters:

find [line 400]

Parameters:

getNamespaces [line 675]

profile [line 687]