‪TYPO3CMS  9.5
TYPO3\CMS\Core\Charset\CharsetConverter Class Reference
Inheritance diagram for TYPO3\CMS\Core\Charset\CharsetConverter:
TYPO3\CMS\Core\SingletonInterface

Public Member Functions

string parse_charset ($charset)
 
string conv ($inputString, $fromCharset, $toCharset, $useEntityForNoChar=null)
 
 convArray (&$array, $fromCharset, $toCharset, $useEntityForNoChar=false)
 
string utf8_encode ($str, $charset)
 
string utf8_decode ($str, $charset, $useEntityForNoChar=false)
 
string utf8_to_entities ($str)
 
string entities_to_utf8 ($str)
 
array utf8_to_numberarray ($str)
 
string UnumberToChar ($unicodeInteger)
 
int utf8CharToUnumber ($str, $hex=false)
 
string crop ($charset, $string, $len, $crop='')
 
string convCaseFirst ($charset, $string, $case)
 
string specCharsToASCII ($charset, $string)
 
string sb_char_mapping ($str, $charset)
 
int utf8_char2byte_pos ($str, $pos)
 
string utf8_char_mapping ($str)
 
string euc_char_mapping ($str, $charset)
 

Protected Member Functions

int initCharset ($charset)
 
int initUnicodeData ()
 
int initToASCII ($charset)
 

Protected Attributes

array $deprecatedPublicProperties
 
int $noCharByteVal = 63
 
array $parsedCharsets = array( )
 
array $toASCII = array( )
 
array $twoByteSets
 
array $eucBasedSets
 
array $synonyms
 

Detailed Description

Notes on UTF-8

Functions working on UTF-8 strings:

  • ‪strchr/strstr
  • ‪strrchr
  • ‪substr_count
  • ‪implode/explode/join

Functions nearly working on UTF-8 strings:

  • ‪trim/ltrim/rtrim: the second parameter 'charlist' won't work for characters not contained in 7-bit ASCII
  • ‪htmlentities: charset support for UTF-8 only since PHP 4.3.0
  • ‪preg_*: Support compiled into PHP by default nowadays, but could be unavailable, need to use modifier

Functions NOT working on UTF-8 strings:

  • ‪str*cmp
  • ‪stristr
  • ‪stripos
  • ‪substr
  • ‪strrev
  • ‪split/spliti
  • ‪... Class for conversion between charsets

Definition at line 53 of file CharsetConverter.php.

Member Function Documentation

◆ conv()

string TYPO3\CMS\Core\Charset\CharsetConverter::conv (   $inputString,
  $fromCharset,
  $toCharset,
  $useEntityForNoChar = null 
)

Convert from one charset to another charset.

Parameters
string$inputString‪Input string
string$fromCharset‪From charset (the current charset of the string)
string$toCharset‪To charset (the output charset wanted)
bool$useEntityForNoChar‪If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
‪string Converted string
See also
convArray()

Definition at line 229 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\convArray().

◆ convArray()

TYPO3\CMS\Core\Charset\CharsetConverter::convArray ( $array,
  $fromCharset,
  $toCharset,
  $useEntityForNoChar = false 
)

Convert all elements in ARRAY with type string from one charset to another charset. NOTICE: Array is passed by reference!

Parameters
array$array‪Input array, possibly multidimensional
string$fromCharset‪From charset (the current charset of the string)
string$toCharset‪To charset (the output charset wanted)
bool$useEntityForNoChar‪If set, then characters that are not available in the destination character set will be encoded as numeric entities
See also
conv()
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0

Definition at line 269 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ convCaseFirst()

string TYPO3\CMS\Core\Charset\CharsetConverter::convCaseFirst (   $charset,
  $string,
  $case 
)

Equivalent of lcfirst/ucfirst but using character set.

Parameters
string$charset
string$string
string$case‪can be 'toLower' or 'toUpper'
Returns
‪string
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0.

Definition at line 955 of file CharsetConverter.php.

◆ crop()

string TYPO3\CMS\Core\Charset\CharsetConverter::crop (   $charset,
  $string,
  $len,
  $crop = '' 
)

Truncates a string and pre-/appends a string. Unit tested by Kasper

Parameters
string$charset‪The character set
string$string‪Character string
int$len‪Length (in characters)
string$crop‪Crop signifier
Returns
‪string The shortened string
See also
‪substr(), mb_strimwidth()
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0

Definition at line 932 of file CharsetConverter.php.

◆ entities_to_utf8()

string TYPO3\CMS\Core\Charset\CharsetConverter::entities_to_utf8 (   $str)

Converts numeric entities (UNICODE, eg. decimal (Ӓ) or hexadecimal ()) to UTF-8 multibyte chars. All string-HTML entities (like & or £) will be converted as well

Parameters
string$str‪Input string, UTF-8
Returns
‪string Output string
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0

Definition at line 463 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar().

◆ euc_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::euc_char_mapping (   $str,
  $charset 
)

Maps all characters of a string in the EUC charset family.

Parameters
string$str‪EUC multibyte character string
string$charset‪The charset
Returns
‪string The converted string

Definition at line 1130 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ initCharset()

int TYPO3\CMS\Core\Charset\CharsetConverter::initCharset (   $charset)
protected

This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters
string$charset‪The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Returns
‪int Returns '1' if already loaded, '2' if the charset conversion table was found and parsed.
Exceptions
UnknownCharsetException‪if no charset table was found

Definition at line 654 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\ExtensionManagementUtility\extPath(), TYPO3\CMS\Core\Core\Environment\getVarPath(), and TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

◆ initToASCII()

int TYPO3\CMS\Core\Charset\CharsetConverter::initToASCII (   $charset)
protected

This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions.

Parameters
string$charsetCharset for which to initialize conversion.
Returns
‪int Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached).

Definition at line 881 of file CharsetConverter.php.

References TYPO3\CMS\Core\Core\Environment\getVarPath(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initUnicodeData()

int TYPO3\CMS\Core\Charset\CharsetConverter::initUnicodeData ( )
protected

This function initializes all UTF-8 character data tables.

PLEASE SEE: http://www.unicode.org/Public/UNIDATA/

Returns
‪int Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached).

Definition at line 718 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\ExtensionManagementUtility\extPath(), TYPO3\CMS\Core\Core\Environment\getVarPath(), and TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

◆ parse_charset()

string TYPO3\CMS\Core\Charset\CharsetConverter::parse_charset (   $charset)

Normalize - changes input character set to lowercase letters.

Parameters
string$charset‪Input charset
Returns
‪string Normalized charset
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0

Definition at line 204 of file CharsetConverter.php.

◆ sb_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::sb_char_mapping (   $str,
  $charset 
)

Maps all characters of a string in a single byte charset.

Parameters
string$str‪The string
string$charset‪The charset
Returns
‪string The converted string

Definition at line 998 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ specCharsToASCII()

string TYPO3\CMS\Core\Charset\CharsetConverter::specCharsToASCII (   $charset,
  $string 
)

Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

Parameters
string$charset‪Character set of string
string$string‪Input string to convert
Returns
‪string The converted string

Definition at line 973 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

◆ UnumberToChar()

string TYPO3\CMS\Core\Charset\CharsetConverter::UnumberToChar (   $unicodeInteger)

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation 1 | 7 | 0vvvvvvv 2 | 11 | 110vvvvv 10vvvvvv 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

Parameters
int$unicodeInteger‪UNICODE integer
Returns
‪string UTF-8 multibyte character string
See also
utf8CharToUnumber()

Definition at line 566 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\entities_to_utf8(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

◆ utf8_char2byte_pos()

int TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char2byte_pos (   $str,
  $pos 
)

Translates a character position into an 'absolute' byte position. Unit tested by Kasper.

Parameters
string$str‪UTF-8 string
int$pos‪Character position (negative values start from the end)
Returns
‪int Byte position
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0.

Definition at line 1032 of file CharsetConverter.php.

◆ utf8_char_mapping()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char_mapping (   $str)

Maps all characters of an UTF-8 string.

Parameters
string$str‪UTF-8 string
Returns
‪string The converted string

Definition at line 1078 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ utf8_decode()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_decode (   $str,
  $charset,
  $useEntityForNoChar = false 
)

Converts $str from UTF-8 to $charset

Parameters
string$str‪String in UTF-8 to convert to local charset
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
bool$useEntityForNoChar‪If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
‪string Output string, converted to local charset

Definition at line 349 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv(), and TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

◆ utf8_encode()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_encode (   $str,
  $charset 
)

Converts $str from $charset to UTF-8

Parameters
string$str‪String in local charset to convert to UTF-8
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
Returns
‪string Output string, converted to UTF-8

Definition at line 288 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ utf8_to_entities()

string TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_entities (   $str)

Converts all chars > 127 to numeric entities.

Parameters
string$str‪Input string
Returns
‪string Output string
Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0

Definition at line 417 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

◆ utf8_to_numberarray()

array TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_numberarray (   $str)

Converts all chars in the input UTF-8 string into integer numbers returned in an array. All HTML entities (like & or £ or { or 㽝) will be detected as characters. Also, instead of integer numbers the real UTF-8 char is returned.

Parameters
string$str‪Input string, UTF-8
Returns
‪array Output array with the char numbers

Definition at line 504 of file CharsetConverter.php.

◆ utf8CharToUnumber()

int TYPO3\CMS\Core\Charset\CharsetConverter::utf8CharToUnumber (   $str,
  $hex = false 
)

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

Parameters
string$str‪UTF-8 multibyte character string
bool$hex‪If set, then a hex. number is returned.
Returns
‪int UNICODE integer
See also
UnumberToChar()

Definition at line 612 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_entities().

Member Data Documentation

◆ $deprecatedPublicProperties

array TYPO3\CMS\Core\Charset\CharsetConverter::$deprecatedPublicProperties
protected
Initial value:
= array(
'noCharByteVal' => 'Using $noCharByteVal of class CharsetConverter from the outside is discouraged, as this only reflects a fixed constant.',
'parsedCharsets' => 'Using $parsedCharsets of class CharsetConverter from the outside is discouraged, as this only reflects a local runtime cache.',
'toASCII' => 'Using $toASCII of class CharsetConverter from the outside is discouraged, as this only reflects a local runtime cache.',
'twoByteSets' => 'Using $twoByteSets of class CharsetConverter from the outside is discouraged.',
'eucBasedSets' => 'Using $eucBasedSets of class CharsetConverter from the outside is discouraged.',
'synonyms' => 'Using $synonyms of class CharsetConverter from the outside is discouraged, as this functionality will be removed in TYPO3 v10.0.',
)

List of all deprecated public properties

Definition at line 60 of file CharsetConverter.php.

◆ $eucBasedSets

array TYPO3\CMS\Core\Charset\CharsetConverter::$eucBasedSets
protected
Initial value:
= array(
'gb2312' => 1,
'big5' => 1,
'euc-kr' => 1,
'shift_jis' => 1
)

This tells the converter which charsets use a scheme like the Extended Unix Code:

Definition at line 99 of file CharsetConverter.php.

◆ $noCharByteVal

int TYPO3\CMS\Core\Charset\CharsetConverter::$noCharByteVal = 63
protected

ASCII Value for chars with no equivalent.

Definition at line 73 of file CharsetConverter.php.

◆ $parsedCharsets

array TYPO3\CMS\Core\Charset\CharsetConverter::$parsedCharsets = array( )
protected

This is the array where parsed conversion tables are stored (cached)

Definition at line 79 of file CharsetConverter.php.

◆ $synonyms

array TYPO3\CMS\Core\Charset\CharsetConverter::$synonyms
protected

http://czyborra.com/charsets/iso8859.html deprecated 193.

Definition at line 112 of file CharsetConverter.php.

◆ $toASCII

array TYPO3\CMS\Core\Charset\CharsetConverter::$toASCII = array( )
protected

An array where charset-to-ASCII mappings are stored (cached)

Definition at line 85 of file CharsetConverter.php.

◆ $twoByteSets

array TYPO3\CMS\Core\Charset\CharsetConverter::$twoByteSets
protected
Initial value:
= array(
'ucs-2' => 1
)

This tells the converter which charsets has two bytes per char:

Definition at line 91 of file CharsetConverter.php.