Text_LanguageDetect
[ class tree: Text_LanguageDetect ] [ index: Text_LanguageDetect ] [ all elements ]

Class: Text_LanguageDetect_Parser

Source Location: /Text_LanguageDetect-1.0.0/Text/LanguageDetect/Parser.php

Class Overview

Text_LanguageDetect
   |
   --Text_LanguageDetect_Parser

This class represents a text sample to be parsed.


Author(s):

Version:

  • Release: 1.0.0

Copyright:

  • 2006 Nicholas Pisarro

Variables

Methods


Inherited Variables

Inherited Methods

Class: Text_LanguageDetect

Text_LanguageDetect::__construct()
Constructor
Text_LanguageDetect::clusteredSearch()
Perform an intelligent detection based on clusterLanguages()
Text_LanguageDetect::clusterLanguages()
Cluster known languages according to languageSimilarity()
Text_LanguageDetect::detect()
Detects the closeness of a sample of text to the known languages
Text_LanguageDetect::detectConfidence()
Returns an array containing the most similar language and a confidence rating
Text_LanguageDetect::detectSimple()
Returns only the most similar language to the text sample
Text_LanguageDetect::detectUnicodeBlocks()
Returns the distribution of unicode blocks in a given utf8 string
Text_LanguageDetect::getLanguageCount()
Returns the number of languages that this object can detect
Text_LanguageDetect::getLanguages()
Returns the list of detectable languages
Text_LanguageDetect::languageExists()
Checks if the language with the given name exists in the database
Text_LanguageDetect::languageSimilarity()
Calculate the similarities between the language models
Text_LanguageDetect::omitLanguages()
Omits languages
Text_LanguageDetect::setNameMode()
Sets the way how language names are accepted and returned.
Text_LanguageDetect::setPerlCompatible()
Make this object behave like Language::Guess
Text_LanguageDetect::unicodeBlockName()
Returns the block name for a given unicode value
Text_LanguageDetect::useUnicodeBlocks()
Whether to use unicode block ranges in detection
Text_LanguageDetect::utf8strlen()
UTF8-safe strlen()
Text_LanguageDetect::_arr_rank()
Converts a set of trigrams from frequencies to ranks
Text_LanguageDetect::_bub_sort()
Sorts an array by value breaking ties alphabetically
Text_LanguageDetect::_checkTrigram()
Checks if this object is ready to detect languages
Text_LanguageDetect::_convertFromNameMode()
Converts an $language input parameter from the configured mode to the language name that is used internally.
Text_LanguageDetect::_convertToNameMode()
Converts an $language output parameter from the language name that is used internally to the configured mode.
Text_LanguageDetect::_distance()
Calculates a linear rank-order distance statistic between two sets of ranked trigrams
Text_LanguageDetect::_get_data_loc()
Returns the path to the location of the database
Text_LanguageDetect::_next_char()
UTF8-safe fast character iterator
Text_LanguageDetect::_normalize_score()
Normalizes the score returned by _distance()
Text_LanguageDetect::_readdb()
Loads the language trigram database from filename
Text_LanguageDetect::_read_unicode_block_db()
Brings up the unicode block database
Text_LanguageDetect::_sort_func()
Sort function used by bubble sort
Text_LanguageDetect::_trigram()
Converts a piece of text into trigrams
Text_LanguageDetect::_unicode_block_name()
Searches the unicode block database
Text_LanguageDetect::_utf8char2unicode()
Returns the unicode value of a utf8 char

Class Details

[line 33]
This class represents a text sample to be parsed.

This separates the analysis of a text sample from the primary LanguageDetect class. After a new profile has been built, the data can be retrieved using the accessor functions.

This class is intended to be used by the Text_LanguageDetect class, not end-users.



[ Top ]


Class Variables

$_compile_trigram =  false

[line 75]

Whether the parser should compile trigrams
  • Access: protected

Type:   bool


[ Top ]

$_compile_unicode =  false

[line 68]

Whether the parser should compile the unicode ranges
  • Access: protected

Type:   bool


[ Top ]

$_string =

[line 40]

The piece of text being parsed
  • Access: protected

Type:   string


[ Top ]

$_trigrams = array()

[line 47]

Stores the trigram frequencies of the sample
  • Access: protected

Type:   string


[ Top ]

$_trigram_pad_start =  false

[line 82]

Whether the trigram parser should pad the beginning of the string
  • Access: protected

Type:   bool


[ Top ]

$_trigram_ranks = array()

[line 54]

Stores the trigram ranks of the sample
  • Access: protected

Type:   array


[ Top ]

$_unicode_blocks = array()

[line 61]

Stores the unicode blocks of the sample
  • Access: protected

Type:   array


[ Top ]

$_unicode_skip_symbols =  true

[line 89]

Whether the unicode parser should skip non-alphabetical ascii chars
  • Access: protected

Type:   bool


[ Top ]



Method Detail

Text_LanguageDetect_Parser (Constructor)   [line 108]

void Text_LanguageDetect_Parser( string $string)

PHP 4 constructor for backwards compatibility.
  • Access: public

Parameters:

string   $string   —  string to be parsed

[ Top ]

__construct (Constructor)   [line 96]

Text_LanguageDetect_Parser __construct( string $string)

Constructor
  • Access: public

Overrides Text_LanguageDetect::__construct() (Constructor)

Parameters:

string   $string   —  string to be parsed

[ Top ]

analyze   [line 220]

void analyze( )

Executes the parsing operation

Be sure to call the set*() functions to set options and the prepare*() functions first to tell it what kind of data to compute

Afterwards the get*() functions can be used to access the compiled information.

  • Access: public

[ Top ]

getTrigramFreqs   [line 194]

array getTrigramFreqs( )

Return the trigram freqency table

Only used in testing to make sure the parser is working

  • Return: Trigram freqencies in the text sample
  • Access: public

[ Top ]

getTrigramRanks   [line 182]

array getTrigramRanks( )

Returns the trigram ranks for the text sample
  • Return: Trigram ranks in the text sample
  • Access: public

[ Top ]

getUnicodeBlocks   [line 204]

array getUnicodeBlocks( )

Returns the array of unicode blocks
  • Return: Unicode blocks in the text sample
  • Access: public

[ Top ]

prepareTrigram   [line 136]

void prepareTrigram( [bool $bool = true])

Turn on/off trigram counting
  • Access: public

Parameters:

bool   $bool   —  true for on, false for off

[ Top ]

prepareUnicode   [line 148]

void prepareUnicode( [bool $bool = true])

Turn on/off unicode block counting
  • Access: public

Parameters:

bool   $bool   —  true for on, false for off

[ Top ]

setPadStart   [line 160]

void setPadStart( [bool $bool = true])

Turn on/off padding the beginning of the sample string
  • Access: public

Parameters:

bool   $bool   —  true for on, false for off

[ Top ]

setUnicodeSkipSymbols   [line 172]

void setUnicodeSkipSymbols( [bool $bool = true])

Should the unicode block counter skip non-alphabetical ascii chars?
  • Access: public

Parameters:

bool   $bool   —  true for on, false for off

[ Top ]

validateString   [line 120]

bool validateString( string $str)

Returns true if a string is suitable for parsing
  • Return: true if acceptable, false if not
  • Access: public

Parameters:

string   $str   —  input string to test

[ Top ]


Documentation generated on Thu, 02 Mar 2017 16:30:02 +0000 by phpDocumentor 1.4.4. PEAR Logo Copyright © PHP Group 2004.