Text_LanguageDetect
[ class tree: Text_LanguageDetect ] [ index: Text_LanguageDetect ] [ all elements ]

Class: Text_LanguageDetect_Parser

Source Location: /Text_LanguageDetect-0.3.0/Text/LanguageDetect/Parser.php

Class Overview

Text_LanguageDetect
   |
   --Text_LanguageDetect_Parser

This class represents a text sample to be parsed.


Author(s):

  • Nicholas Pisarro

Version:

  • release: 0.3.0

Copyright:

  • 2006

Methods


Inherited Variables

Inherited Methods

Class: Text_LanguageDetect

Text_LanguageDetect::__construct()
Constructor
Text_LanguageDetect::clusteredSearch()
Perform an intelligent detection based on clusterLanguages()
Text_LanguageDetect::clusterLanguages()
Cluster known languages according to languageSimilarity()
Text_LanguageDetect::detect()
Detects the closeness of a sample of text to the known languages
Text_LanguageDetect::detectConfidence()
Returns an array containing the most similar language and a confidence rating
Text_LanguageDetect::detectSimple()
Returns only the most similar language to the text sample
Text_LanguageDetect::detectUnicodeBlocks()
Returns the distribution of unicode blocks in a given utf8 string
Text_LanguageDetect::getLanguageCount()
Returns the number of languages that this object can detect
Text_LanguageDetect::getLanguages()
Returns the list of detectable languages
Text_LanguageDetect::languageExists()
Checks if the language with the given name exists in the database
Text_LanguageDetect::languageSimilarity()
Calculate the similarities between the language models
Text_LanguageDetect::omitLanguages()
Omits languages
Text_LanguageDetect::setNameMode()
Sets the way how language names are accepted and returned.
Text_LanguageDetect::setPerlCompatible()
Make this object behave like Language::Guess
Text_LanguageDetect::unicodeBlockName()
Returns the block name for a given unicode value
Text_LanguageDetect::useUnicodeBlocks()
Whether to use unicode block ranges in detection
Text_LanguageDetect::utf8strlen()
ut8-safe strlen()
Text_LanguageDetect::_arr_rank()
Converts a set of trigrams from frequencies to ranks
Text_LanguageDetect::_read_unicode_block_db()
Brings up the unicode block database
Text_LanguageDetect::_unicode_block_name()
Searches the unicode block database
Text_LanguageDetect::_utf8char2unicode()
Returns the unicode value of a utf8 char

Class Details

[line 33]
This class represents a text sample to be parsed.

This separates the analysis of a text sample from the primary LanguageDetect class. After a new profile has been built, the data can be retrieved using the accessor functions.

This class is intended to be used by the Text_LanguageDetect class, not end-users.

  • Author: Nicholas Pisarro
  • Version: release: 0.3.0
  • Copyright: 2006
  • License: BSD


[ Top ]


Method Detail

analyze   [line 213]

void analyze( )

Executes the parsing operation

Be sure to call the set*() functions to set options and the prepare*() functions first to tell it what kind of data to compute

Afterwards the get*() functions can be used to access the compiled information.

  • Access: public

[ Top ]

getTrigramFreqs   [line 186]

array &getTrigramFreqs( )

Return the trigram freqency table

only used in testing to make sure the parser is working

  • Return: trigram freqencies in the text sample
  • Access: public

[ Top ]

getTrigramRanks   [line 173]

array &getTrigramRanks( )

Returns the trigram ranks for the text sample
  • Return: trigram ranks in the text sample
  • Access: public

[ Top ]

getUnicodeBlocks   [line 197]

array &getUnicodeBlocks( )

returns the array of unicode blocks
  • Return: unicode blocks in the text sample
  • Access: public

[ Top ]

prepareTrigram   [line 129]

void prepareTrigram( [bool $bool = true])

turn on/off trigram counting
  • Access: public

Parameters:

bool   $bool     true for on, false for off

[ Top ]

prepareUnicode   [line 140]

void prepareUnicode( [bool $bool = true])

turn on/off unicode block counting
  • Access: public

Parameters:

bool   $bool     true for on, false for off

[ Top ]

setPadStart   [line 151]

void setPadStart( [bool $bool = true])

turn on/off padding the beginning of the sample string
  • Access: public

Parameters:

bool   $bool     true for on, false for off

[ Top ]

setUnicodeSkipSymbols   [line 162]

void setUnicodeSkipSymbols( [bool $bool = true])

Should the unicode block counter skip non-alphabetical ascii chars?
  • Access: public

Parameters:

bool   $bool     true for on, false for off

[ Top ]

validateString   [line 115]

bool validateString( string $str)

Returns true if a string is suitable for parsing
  • Return: true if acceptable, false if not
  • Access: public

Parameters:

string   $str     input string to test

[ Top ]


Documentation generated on Mon, 16 Jan 2012 10:00:04 +0000 by phpDocumentor 1.4.3. PEAR Logo Copyright © PHP Group 2004.