‪TYPO3CMS  ‪main
TYPO3\CMS\Core\Charset\CharsetConverter Class Reference
Inheritance diagram for TYPO3\CMS\Core\Charset\CharsetConverter:
TYPO3\CMS\Core\SingletonInterface

Public Member Functions

string conv (string $inputString, string $fromCharset, string $toCharset)
 
string utf8_encode (string $str, string $charset)
 
string utf8_decode (string $str, string $charset, bool $useEntityForNoChar=false)
 
array utf8_to_numberarray (string $str)
 
string UnumberToChar ($unicodeInteger)
 
int utf8CharToUnumber (string $str, bool $hex=false)
 
string specCharsToASCII (string $charset, $string)
 
string sb_char_mapping (string $str, string $charset)
 
string utf8_char_mapping (string $str)
 
string euc_char_mapping (string $str, string $charset)
 

Protected Member Functions

bool initCharset (string $charset)
 
bool initUnicodeData ()
 
bool initToASCII (string $charset)
 

Protected Attributes

int $noCharByteVal = 63
 
array $parsedCharsets = []
 
array $toASCII = []
 
array $twoByteSets
 
array $eucBasedSets
 

Detailed Description

Notes on UTF-8

Functions working on UTF-8 strings:

  • ‪strchr/strstr
  • ‪strrchr
  • ‪substr_count
  • ‪implode/explode/join

Functions nearly working on UTF-8 strings:

  • ‪trim/ltrim/rtrim: the second parameter 'charlist' won't work for characters not contained in 7-bit ASCII
  • ‪htmlentities: charset support for UTF-8 only since PHP 4.3.0
  • ‪preg_*: Support compiled into PHP by default nowadays, but could be unavailable, need to use modifier

Functions NOT working on UTF-8 strings:

  • ‪str*cmp
  • ‪stristr
  • ‪stripos
  • ‪substr
  • ‪strrev
  • ‪split/spliti
  • ‪... Class for conversion between charsets

Definition at line 53 of file CharsetConverter.php.

Member Function Documentation

◆ conv()

string TYPO3\CMS\Core\Charset\CharsetConverter::conv ( string  $inputString,
string  $fromCharset,
string  $toCharset 
)

Convert from one charset to another charset.

Parameters
string$inputString‪Input string
string$fromCharset‪From charset (the current charset of the string)
string$toCharset‪To charset (the output charset wanted)
Returns
‪string Converted string

Definition at line 100 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

◆ euc_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::euc_char_mapping ( string  $str,
string  $charset 
)

Maps all characters of a string in the EUC charset family.

Parameters
string$str‪EUC multibyte character string
string$charset‪The charset
Returns
‪string The converted string

Definition at line 798 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ initCharset()

bool TYPO3\CMS\Core\Charset\CharsetConverter::initCharset ( string  $charset)
protected

This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters
string$charset‪The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Returns
‪bool if the charset conversion table was found and parsed.
Exceptions
UnknownCharsetException‪if no charset table was found

Definition at line 413 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\ExtensionManagementUtility\extPath(), TYPO3\CMS\Core\Core\Environment\getVarPath(), TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode(), TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

◆ initToASCII()

bool TYPO3\CMS\Core\Charset\CharsetConverter::initToASCII ( string  $charset)
protected

This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions.

Parameters
string$charsetCharset for which to initialize conversion.
Returns
‪bool Returns FALSE on error, TRUE on success

Definition at line 643 of file CharsetConverter.php.

References TYPO3\CMS\Core\Core\Environment\getVarPath(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initUnicodeData()

◆ sb_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::sb_char_mapping ( string  $str,
string  $charset 
)

Maps all characters of a string in a single byte charset.

Parameters
string$str‪The string
string$charset‪The charset
Returns
‪string The converted string

Definition at line 715 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ specCharsToASCII()

string TYPO3\CMS\Core\Charset\CharsetConverter::specCharsToASCII ( string  $charset,
  $string 
)

Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

Parameters
string$charset‪Character set of string
string$string‪Input string to convert
Returns
‪string The converted string

Definition at line 687 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

◆ UnumberToChar()

string TYPO3\CMS\Core\Charset\CharsetConverter::UnumberToChar (   $unicodeInteger)

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation
1 | 7 | 0vvvvvvv
2 | 11 | 110vvvvv 10vvvvvv
3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv
4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
Parameters
int$unicodeInteger‪UNICODE integer
Returns
‪string UTF-8 multibyte character string
See also
utf8CharToUnumber()

Definition at line 325 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

◆ utf8_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char_mapping ( string  $str)

Maps all characters of a UTF-8 string.

Parameters
string$str‪UTF-8 string
Returns
‪string The converted string

Definition at line 746 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ utf8_decode()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_decode ( string  $str,
string  $charset,
bool  $useEntityForNoChar = false 
)

Converts $str from UTF-8 to $charset

Parameters
string$str‪String in UTF-8 to convert to local charset
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
bool$useEntityForNoChar‪If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
‪string Output string, converted to local charset

Definition at line 191 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv(), and TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

◆ utf8_encode()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_encode ( string  $str,
string  $charset 
)

Converts $str from $charset to UTF-8

Parameters
string$str‪String in local charset to convert to UTF-8
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
Returns
‪string Output string, converted to UTF-8

Definition at line 129 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ utf8_to_numberarray()

array TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_numberarray ( string  $str)

Converts all chars in the input UTF-8 string into integer numbers returned in an array. All HTML entities (like & or £ or { or 㽝) will be detected as characters. Also, instead of integer numbers the real UTF-8 char is returned.

Parameters
string$str‪Input string, UTF-8
Returns
‪array Output array with the char numbers

Definition at line 260 of file CharsetConverter.php.

◆ utf8CharToUnumber()

int TYPO3\CMS\Core\Charset\CharsetConverter::utf8CharToUnumber ( string  $str,
bool  $hex = false 
)

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

Parameters
string$str‪UTF-8 multibyte character string
bool$hex‪If set, then a hex. number is returned.
Returns
‪int UNICODE integer
See also
UnumberToChar()

Definition at line 371 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode().

Member Data Documentation

◆ $eucBasedSets

array TYPO3\CMS\Core\Charset\CharsetConverter::$eucBasedSets
protected
Initial value:
= [
'gb2312' => 1,
'big5' => 1,
'euc-kr' => 1,
'shift_jis' => 1,
]

This tells the converter which charsets use a scheme like the Extended Unix Code:

Definition at line 80 of file CharsetConverter.php.

◆ $noCharByteVal

int TYPO3\CMS\Core\Charset\CharsetConverter::$noCharByteVal = 63
protected

ASCII Value for chars with no equivalent.

Definition at line 58 of file CharsetConverter.php.

◆ $parsedCharsets

array TYPO3\CMS\Core\Charset\CharsetConverter::$parsedCharsets = []
protected

This is the array where parsed conversion tables are stored (cached)

Definition at line 63 of file CharsetConverter.php.

◆ $toASCII

array TYPO3\CMS\Core\Charset\CharsetConverter::$toASCII = []
protected

An array where charset-to-ASCII mappings are stored (cached)

Definition at line 68 of file CharsetConverter.php.

◆ $twoByteSets

array TYPO3\CMS\Core\Charset\CharsetConverter::$twoByteSets
protected
Initial value:
= [
'ucs-2' => 1,
]

This tells the converter which charsets has two bytes per char:

Definition at line 73 of file CharsetConverter.php.