CharsetConverter implements SingletonInterface
Class for conversion between charsets
Table of Contents
Interfaces
- SingletonInterface
- "empty" interface for singletons (marker interface pattern)
Properties
- $eucBasedSets : array<string|int, mixed>
- This tells the converter which charsets use a scheme like the Extended Unix Code:
- $noCharByteVal : int
- ASCII Value for chars with no equivalent.
- $parsedCharsets : array<string|int, mixed>
- This is the array where parsed conversion tables are stored (cached)
- $toASCII : array<string|int, mixed>
- An array where charset-to-ASCII mappings are stored (cached)
- $twoByteSets : array<string|int, mixed>
- This tells the converter which charsets has two bytes per char:
Methods
- conv() : string
- Convert from one charset to another charset.
- euc_char_mapping() : string
- Maps all characters of a string in the EUC charset family.
- sb_char_mapping() : string
- Maps all characters of a string in a single byte charset.
- specCharsToASCII() : string
- Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)
- UnumberToChar() : string
- Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper
- utf8_char_mapping() : string
- Maps all characters of a UTF-8 string.
- utf8_decode() : string
- Converts $str from UTF-8 to $charset
- utf8_encode() : string
- Converts $str from $charset to UTF-8
- utf8_to_numberarray() : array<string|int, mixed>
- Converts all chars in the input UTF-8 string into integer numbers returned in an array.
- utf8CharToUnumber() : int
- Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper
- initCharset() : bool
- This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions
- initToASCII() : bool
- This function initializes the to-ASCII conversion table for a charset other than UTF-8.
- initUnicodeData() : bool
- This function initializes all UTF-8 character data tables.
Properties
$eucBasedSets
This tells the converter which charsets use a scheme like the Extended Unix Code:
protected
array<string|int, mixed>
$eucBasedSets
= [
'gb2312' => 1,
// Chinese, simplified.
'big5' => 1,
// Chinese, traditional.
'euc-kr' => 1,
// Korean
'shift_jis' => 1,
]
$noCharByteVal
ASCII Value for chars with no equivalent.
protected
int
$noCharByteVal
= 63
$parsedCharsets
This is the array where parsed conversion tables are stored (cached)
protected
array<string|int, mixed>
$parsedCharsets
= []
$toASCII
An array where charset-to-ASCII mappings are stored (cached)
protected
array<string|int, mixed>
$toASCII
= []
$twoByteSets
This tells the converter which charsets has two bytes per char:
protected
array<string|int, mixed>
$twoByteSets
= ['ucs-2' => 1]
Methods
conv()
Convert from one charset to another charset.
public
conv(string $inputString, string $fromCharset, string $toCharset) : string
Parameters
- $inputString : string
-
Input string
- $fromCharset : string
-
From charset (the current charset of the string)
- $toCharset : string
-
To charset (the output charset wanted)
Return values
string —Converted string
euc_char_mapping()
Maps all characters of a string in the EUC charset family.
public
euc_char_mapping(string $str, string $charset) : string
Parameters
- $str : string
-
EUC multibyte character string
- $charset : string
-
The charset
Return values
string —The converted string
sb_char_mapping()
Maps all characters of a string in a single byte charset.
public
sb_char_mapping(string $str, string $charset) : string
Parameters
- $str : string
-
The string
- $charset : string
-
The charset
Return values
string —The converted string
specCharsToASCII()
Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)
public
specCharsToASCII(string $charset, string $string) : string
Parameters
- $charset : string
-
Character set of string
- $string : string
-
Input string to convert
Return values
string —The converted string
UnumberToChar()
Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper
public
UnumberToChar(int $unicodeInteger) : string
The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:
bytes | bits | representation
1 | 7 | 0vvvvvvv
2 | 11 | 110vvvvv 10vvvvvv
3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv
4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
Parameters
- $unicodeInteger : int
-
UNICODE integer
Tags
Return values
string —UTF-8 multibyte character string
utf8_char_mapping()
Maps all characters of a UTF-8 string.
public
utf8_char_mapping(string $str) : string
Parameters
- $str : string
-
UTF-8 string
Return values
string —The converted string
utf8_decode()
Converts $str from UTF-8 to $charset
public
utf8_decode(string $str, string $charset[, bool $useEntityForNoChar = false ]) : string
Parameters
- $str : string
-
String in UTF-8 to convert to local charset
- $charset : string
-
Charset, lowercase. Must be found in csconvtbl/ folder.
- $useEntityForNoChar : bool = false
-
If set, then characters that are not available in the destination character set will be encoded as numeric entities
Return values
string —Output string, converted to local charset
utf8_encode()
Converts $str from $charset to UTF-8
public
utf8_encode(string $str, string $charset) : string
Parameters
- $str : string
-
String in local charset to convert to UTF-8
- $charset : string
-
Charset, lowercase. Must be found in csconvtbl/ folder.
Return values
string —Output string, converted to UTF-8
utf8_to_numberarray()
Converts all chars in the input UTF-8 string into integer numbers returned in an array.
public
utf8_to_numberarray(string $str) : array<string|int, mixed>
All HTML entities (like & or £ or { or 㽝) will be detected as characters. Also, instead of integer numbers the real UTF-8 char is returned.
Parameters
- $str : string
-
Input string, UTF-8
Return values
array<string|int, mixed> —Output array with the char numbers
utf8CharToUnumber()
Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper
public
utf8CharToUnumber(string $str[, bool $hex = false ]) : int
Parameters
- $str : string
-
UTF-8 multibyte character string
- $hex : bool = false
-
If set, then a hex. number is returned.
Tags
Return values
int —UNICODE integer
initCharset()
This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions
protected
initCharset(string $charset) : bool
PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/
Parameters
- $charset : string
-
The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Tags
Return values
bool —if the charset conversion table was found and parsed.
initToASCII()
This function initializes the to-ASCII conversion table for a charset other than UTF-8.
protected
initToASCII(string $charset) : bool
This function is automatically called by the ASCII transliteration functions.
Parameters
- $charset : string
-
Charset for which to initialize conversion.
Return values
bool —Returns FALSE on error, TRUE on success
initUnicodeData()
This function initializes all UTF-8 character data tables.
protected
initUnicodeData() : bool
PLEASE SEE: http://www.unicode.org/Public/UNIDATA/
Return values
bool —Returns FALSE on error, TRUE value on success