[ class tree: I18N_UnicodeNormalizer ] [ index: I18N_UnicodeNormalizer ] [ all elements ]

Class: I18N_UnicodeNormalizer_Compiler

Source Location: /I18N_UnicodeNormalizer-1.0.0/UnicodeNormalizer/Compiler.php

Class Overview

Unicode data compiler



  • Release: @package_version@


  • 2007 Michel Corne


Inherited Variables

Inherited Methods

Class Details

[line 74]
Unicode data compiler

Compiles the data files to files with Unicode code points converted to UTF-8 characters, e.g. utf8/CanonicalCombining.php. Compiles the data files to files with the Unicode code points converted to their UCN format, e.g. ucn/CanonicalCombining.php. The latter is useful for testing purposes.

The original Unicode.org data files are split into quick check files, a character combining file, decomposition files, a composition exclusion file, character composition files, test files. Each file contains a PHP array with the characters as key.

The Hangul data is generated with an algorithm.

The compilation only needs to be run once.

[ Top ]

Method Detail

__construct (Constructor)   [line 162]

void __construct( [string $dir = ''], [string $codeFormat = ''], [boolean $forceCompile = false], [integer $limitedDataSet = false])

The class constructor

Sets the Unicode code point format, the force-compilation option, the code points range limit. Sets the paths of the data and compiled file names

  • Access: public


string   $dir     the data/compiled files base directory, the default is set by the normalizer
string   $codeFormat     'utf8': for production, or 'ucn': for testing purposes, the default is 'utf8'
boolean   $forceCompile     compilation option: files are to be (re)compiled if true, files are recompiled as needed if false
integer   $limitedDataSet     compiles a small subset of data for testing/coverage purposes if true, or use all data otherwise

[ Top ]

compileAll   [line 187]

void compileAll( )

Compiles the Unicode composition, decomposition and combining text data files into PHP
  • Access: public

[ Top ]

convertCode   [line 472]

string convertCode( string $code)

Converts the Unicode code point to a UTF-8 character or its UCN format
  • Return: the UTF-8 character or the Unicode code point in its UCN format
  • Access: public


string   $code     the Unicode code point

[ Top ]

getFileNames   [line 526]

array getFileNames( )

Gets the name list of the compiled files
  • Return: the name list of compiled files
  • Access: public

[ Top ]

implode   [line 538]

array implode( array $array)

Implodes an array's arrays
  • Return: an array of strings
  • Access: public


array   $array     the array of arrays

[ Top ]

splitCodes   [line 653]

array splitCodes( string $codes)

Splits and converts a Unicode code points string to UTF-8 characters or their UCN format
  • Return: the UTF-8 characters or their UCN format
  • Access: public


string   $codes     the Unicode code points string, e.g. a decomposition mapping: 0020 0308

[ Top ]

Documentation generated on Sat, 04 Aug 2007 11:00:10 -0400 by phpDocumentor 1.4.0. PEAR Logo Copyright © PHP Group 2004.