| » Metadata | » Status |
|---|---|
|
|
| » Description | |
|
Unicode Normalizer "...Unicode's normalization is the concept of character composition and decomposition. Character composition is the process of combining simpler characters into fewer precomposed characters, such as the n character and the combining ~ character into the single ñ character. Decomposition is the opposite process, breaking precomposed characters back into their component pieces... Normalization is important when comparing text strings for searching and sorting (collation)..." [Wikipedia] Performs the 4 normalizations: NFD: Canonical Decomposition NFC: Canonical Decomposition, followed by Canonical Composition NFKD: Compatibility Decomposition NFKC: Compatibility Decomposition, followed by Canonical Composition Complies with the official Unicode.org regression test. Fully tested with PHPUnit. Code coverage is close to 100%. Optimized to provide a performance gain up to 9X vs other implementations. Uses UTF8 binary strings natively but can normalize a string in any UTF format. Example 1: NFC-normalization of UTF-8 string 'foo' $normalized = I18N_UnicodeNormalizer::toNFC('foo'); or $normalizer = new I18N_UnicodeNormalizer(); $normalized = $normalizer->normalize('foo', 'NFC') Example 2: NFC-normalization of ISO-8859-1 string 'foo' $normalized = I18N_UnicodeNormalizer::toNFC('foo', 'ISO-8859-1'); or $normalizer = new I18N_UnicodeNormalizer(); $normalized = $normalizer->normalize('foo', 'NFC', 'ISO-8859-1') |
|
| » Dependencies | » Links |
|
|
|
| » Timeline | » Changelog |
|
|