TYPO3 CMS  TYPO3_7-6
TYPO3\CMS\Core\Charset\CharsetConverter Class Reference

Public Member Functions

 parse_charset ($charset)
 
 get_locale_charset ($locale)
 
 conv ($inputString, $fromCharset, $toCharset, $useEntityForNoChar=false)
 
 convArray (&$array, $fromCharset, $toCharset, $useEntityForNoChar=false)
 
 utf8_encode ($str, $charset)
 
 utf8_decode ($str, $charset, $useEntityForNoChar=false)
 
 utf8_to_entities ($str)
 
 entities_to_utf8 ($str, $alsoStdHtmlEnt=false)
 
 utf8_to_numberarray ($str, $convEntities=false, $retChar=false)
 
 UnumberToChar ($unicodeInteger)
 
 utf8CharToUnumber ($str, $hex=false)
 
 initCharset ($charset)
 
 initUnicodeData ($mode=null)
 
 initCaseFolding ($charset)
 
 initToASCII ($charset)
 
 substr ($charset, $string, $start, $len=null)
 
 strlen ($charset, $string)
 
 crop ($charset, $string, $len, $crop='')
 
 strtrunc ($charset, $string, $len)
 
 conv_case ($charset, $string, $case)
 
 convCaseFirst ($charset, $string, $case)
 
 convCapitalize ($charset, $string)
 
 specCharsToASCII ($charset, $string)
 
 getPreferredClientLanguage ($languageCodesList)
 
 sb_char_mapping ($str, $charset, $mode, $opt='')
 
 utf8_substr ($str, $start, $len=null)
 
 utf8_strlen ($str)
 
 utf8_strtrunc ($str, $len)
 
 utf8_strpos ($haystack, $needle, $offset=0)
 
 utf8_strrpos ($haystack, $needle)
 
 utf8_char2byte_pos ($str, $pos)
 
 utf8_byte2char_pos ($str, $pos)
 
 utf8_char_mapping ($str, $mode, $opt='')
 
 euc_strtrunc ($str, $len, $charset)
 
 euc_substr ($str, $start, $charset, $len=null)
 
 euc_strlen ($str, $charset)
 
 euc_char2byte_pos ($str, $pos, $charset)
 
 euc_char_mapping ($str, $charset, $mode, $opt='')
 

Public Attributes

 $noCharByteVal = 63
 
 $parsedCharsets = []
 
 $caseFolding = []
 
 $toASCII = []
 
 $twoByteSets
 
 $fourByteSets
 
 $eucBasedSets
 
 $synonyms
 
 $lang_to_script
 
 $script_to_charset_unix
 
 $script_to_charset_windows
 
 $locale_to_charset
 
 $charSetArray
 

Protected Member Functions

 cropMbstring ($charset, $string, $len, $crop='')
 

Detailed Description

Notes on UTF-8

Functions working on UTF-8 strings:

  • strchr/strstr
  • strrchr
  • substr_count
  • implode/explode/join

Functions nearly working on UTF-8 strings:

  • strlen: returns the length in BYTES, if you need the length in CHARACTERS use utf8_strlen
  • trim/ltrim/rtrim: the second parameter 'charlist' won't work for characters not contained in 7-bit ASCII
  • strpos/strrpos: they return the BYTE position, if you need the CHARACTER position use utf8_strpos/utf8_strrpos
  • htmlentities: charset support for UTF-8 only since PHP 4.3.0
  • preg_*: Support compiled into PHP by default nowadays, but could be unavailable, need to use modifier

Functions NOT working on UTF-8 strings:

  • str*cmp
  • stristr
  • stripos
  • substr
  • strrev
  • split/spliti
  • ... Class for conversion between charsets

Definition at line 53 of file CharsetConverter.php.

Member Function Documentation

◆ conv()

TYPO3\CMS\Core\Charset\CharsetConverter::conv (   $inputString,
  $fromCharset,
  $toCharset,
  $useEntityForNoChar = false 
)

Convert from one charset to another charset.

Parameters
string$inputStringInput string
string$fromCharsetFrom charset (the current charset of the string)
string$toCharsetTo charset (the output charset wanted)
bool$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
string Converted string
See also
convArray()

Definition at line 591 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\convArray().

◆ conv_case()

TYPO3\CMS\Core\Charset\CharsetConverter::conv_case (   $charset,
  $string,
  $case 
)

Translates all characters of a string into their respective case values. Unlike strtolower() and strtoupper() this method is locale independent. Note that the string length may change! eg. lower case German "ß" (sharp S) becomes upper case "SS" Unit-tested by Kasper Real case folding is language dependent, this method ignores this fact.

Parameters
string$charsetCharacter set of string
string$stringInput string to convert case for
string$caseCase keyword: "toLower" means lowercase conversion, anything else is uppercase (use "toUpper" )
Returns
string The converted string
See also
strtolower(), strtoupper()

Definition at line 1628 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\convCaseFirst().

◆ convArray()

TYPO3\CMS\Core\Charset\CharsetConverter::convArray ( $array,
  $fromCharset,
  $toCharset,
  $useEntityForNoChar = false 
)

Convert all elements in ARRAY with type string from one charset to another charset. NOTICE: Array is passed by reference!

Parameters
array$arrayInput array, possibly multidimensional
string$fromCharsetFrom charset (the current charset of the string)
string$toCharsetTo charset (the output charset wanted)
bool$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
void
See also
conv()

Definition at line 640 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ convCapitalize()

TYPO3\CMS\Core\Charset\CharsetConverter::convCapitalize (   $charset,
  $string 
)

Capitalize the given string

Parameters
string$charset
string$string
Returns
string

Definition at line 1671 of file CharsetConverter.php.

References $GLOBALS.

◆ convCaseFirst()

TYPO3\CMS\Core\Charset\CharsetConverter::convCaseFirst (   $charset,
  $string,
  $case 
)

Equivalent of lcfirst/ucfirst but using character set.

Parameters
string$charset
string$string
string$case
Returns
string
See also
::conv_case()

Definition at line 1656 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ crop()

TYPO3\CMS\Core\Charset\CharsetConverter::crop (   $charset,
  $string,
  $len,
  $crop = '' 
)

Truncates a string and pre-/appends a string. Unit tested by Kasper

Parameters
string$charsetThe character set
string$stringCharacter string
int$lenLength (in characters)
string$cropCrop signifier
Returns
string The shortened string
See also
substr(), mb_strimwidth()

Definition at line 1542 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\cropMbstring(), TYPO3\CMS\Core\Charset\CharsetConverter\euc_char2byte_pos(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

◆ cropMbstring()

TYPO3\CMS\Core\Charset\CharsetConverter::cropMbstring (   $charset,
  $string,
  $len,
  $crop = '' 
)
protected

Method to crop strings using the mb_substr function.

Parameters
string$charsetThe character set
string$stringString to be cropped
int$lenCrop length (in characters)
string$cropCrop signifier
Returns
string The shortened string
See also
mb_strlen(), mb_substr()

Definition at line 1518 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop().

◆ entities_to_utf8()

TYPO3\CMS\Core\Charset\CharsetConverter::entities_to_utf8 (   $str,
  $alsoStdHtmlEnt = false 
)

Converts numeric entities (UNICODE, eg. decimal (Ӓ) or hexadecimal ()) to UTF-8 multibyte chars

Parameters
string$strInput string, UTF-8
bool$alsoStdHtmlEntIf set, then all string-HTML entities (like & or £ will be converted as well)
Returns
string Output string

Definition at line 829 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_numberarray().

◆ euc_char2byte_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_char2byte_pos (   $str,
  $pos,
  $charset 
)

Translates a character position into an 'absolute' byte position.

Parameters
string$strEUC multibyte character string
int$posCharacter position (negative values start from the end)
string$charsetThe charset
Returns
int Byte position

Definition at line 2198 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop(), and TYPO3\CMS\Core\Charset\CharsetConverter\euc_substr().

◆ euc_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_char_mapping (   $str,
  $charset,
  $mode,
  $opt = '' 
)

Maps all characters of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
string$charsetThe charset
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string

Definition at line 2245 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ euc_strlen()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_strlen (   $str,
  $charset 
)

Counts the number of characters of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
string$charsetThe charset
Returns
int The number of characters
See also
strlen()

Definition at line 2170 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

◆ euc_strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_strtrunc (   $str,
  $len,
  $charset 
)

Cuts a string in the EUC charset family short at a given byte length.

Parameters
string$strEUC multibyte character string
int$lenThe byte length
string$charsetThe charset
Returns
string The shortened string
See also
mb_strcut()

Definition at line 2105 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strtrunc().

◆ euc_substr()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_substr (   $str,
  $start,
  $charset,
  $len = null 
)

Returns a part of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
int$startStart position (character position)
string$charsetThe charset
int$lenLength (in characters)
Returns
string the substring

Definition at line 2141 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char2byte_pos(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ get_locale_charset()

TYPO3\CMS\Core\Charset\CharsetConverter::get_locale_charset (   $locale)

Get the charset of a locale.

ln language ln_CN language / country ln_CN.cs language / country / charset ln_CN.cs language / country / charset / modifier

Parameters
string$localeLocale string
Returns
string Charset resolved for locale string

Definition at line 545 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\parse_charset().

◆ getPreferredClientLanguage()

TYPO3\CMS\Core\Charset\CharsetConverter::getPreferredClientLanguage (   $languageCodesList)

Converts the language codes that we get from the client (usually HTTP_ACCEPT_LANGUAGE) into a TYPO3-readable language code

Parameters
string$languageCodesListList of language codes. something like 'de,en-us;q=0.9,de-de;q=0.7,es-cl;q=0.6,en;q=0.4,es;q=0.3,zh;q=0.1'
Returns
string A preferred language that TYPO3 supports, or "default" if none found

Definition at line 1707 of file CharsetConverter.php.

References $locales, TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ initCaseFolding()

TYPO3\CMS\Core\Charset\CharsetConverter::initCaseFolding (   $charset)

This function initializes the folding table for a charset other than UTF-8. This function is automatically called by the case folding functions.

Parameters
string$charsetCharset for which to initialize case folding.
Returns
int Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private

Definition at line 1325 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initCharset()

TYPO3\CMS\Core\Charset\CharsetConverter::initCharset (   $charset)

This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters
string$charsetThe charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Returns
int Returns '1' if already loaded. Returns FALSE if charset conversion table was not found. Returns '2' if the charset conversion table was found and parsed. private

Definition at line 1021 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\ExtensionManagementUtility\extPath(), TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode(), TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar(), TYPO3\CMS\Core\Utility\GeneralUtility\validPathStr(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

◆ initToASCII()

TYPO3\CMS\Core\Charset\CharsetConverter::initToASCII (   $charset)

This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions.

Parameters
string$charsetCharset for which to initialize conversion.
Returns
int Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private

Definition at line 1387 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initUnicodeData()

TYPO3\CMS\Core\Charset\CharsetConverter::initUnicodeData (   $mode = null)

◆ parse_charset()

TYPO3\CMS\Core\Charset\CharsetConverter::parse_charset (   $charset)

Normalize - changes input character set to lowercase letters.

Parameters
string$charsetInput charset
Returns
string Normalized charset

Definition at line 525 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\get_locale_charset().

◆ sb_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::sb_char_mapping (   $str,
  $charset,
  $mode,
  $opt = '' 
)

Maps all characters of a string in a single byte charset.

Parameters
string$strThe string
string$charsetThe charset
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string

Definition at line 1781 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), and TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ specCharsToASCII()

TYPO3\CMS\Core\Charset\CharsetConverter::specCharsToASCII (   $charset,
  $string 
)

Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

Parameters
string$charsetCharacter set of string
string$stringInput string to convert
Returns
string The converted string

Definition at line 1687 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

◆ strlen()

TYPO3\CMS\Core\Charset\CharsetConverter::strlen (   $charset,
  $string 
)

◆ strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::strtrunc (   $charset,
  $string,
  $len 
)

Cuts a string short at a given byte length.

Parameters
string$charsetThe character set
string$stringCharacter string
int$lenThe byte length
Returns
string The shortened string
See also
mb_strcut()

Definition at line 1590 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\euc_strtrunc(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strtrunc().

◆ substr()

TYPO3\CMS\Core\Charset\CharsetConverter::substr (   $charset,
  $string,
  $start,
  $len = null 
)

◆ UnumberToChar()

TYPO3\CMS\Core\Charset\CharsetConverter::UnumberToChar (   $unicodeInteger)

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation 1 | 7 | 0vvvvvvv 2 | 11 | 110vvvvv 10vvvvvv 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

Parameters
int$unicodeIntegerUNICODE integer
Returns
string UTF-8 multibyte character string
See also
utf8CharToUnumber()

Definition at line 934 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\entities_to_utf8(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

◆ utf8_byte2char_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_byte2char_pos (   $str,
  $pos 
)

Translates an 'absolute' byte position into a character position. Unit tested by Kasper.

Parameters
string$strUTF-8 string
int$posByte position
Returns
int Character position

Definition at line 2016 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strpos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strrpos().

◆ utf8_char2byte_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char2byte_pos (   $str,
  $pos 
)

Translates a character position into an 'absolute' byte position. Unit tested by Kasper.

Parameters
string$strUTF-8 string
int$posCharacter position (negative values start from the end)
Returns
int Byte position

Definition at line 1969 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strpos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_substr().

◆ utf8_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char_mapping (   $str,
  $mode,
  $opt = '' 
)

Maps all characters of an UTF-8 string.

Parameters
string$strUTF-8 string
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string

Definition at line 2045 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ utf8_decode()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_decode (   $str,
  $charset,
  $useEntityForNoChar = false 
)

Converts $str from UTF-8 to $charset

Parameters
string$strString in UTF-8 to convert to local charset
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
bool$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
string Output string, converted to local charset

Definition at line 718 of file CharsetConverter.php.

References $a, TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv(), TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), and TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

◆ utf8_encode()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_encode (   $str,
  $charset 
)

Converts $str from $charset to UTF-8

Parameters
string$strString in local charset to convert to UTF-8
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
Returns
string Output string, converted to UTF-8

Definition at line 658 of file CharsetConverter.php.

References $a, TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ utf8_strlen()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strlen (   $str)

Counts the number of characters of a string in UTF-8. Unit-tested by Kasper and works 100% like strlen() / mb_strlen()

Parameters
string$strUTF-8 multibyte character string
Returns
int The number of characters
See also
strlen()

Definition at line 1863 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

◆ utf8_strpos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strpos (   $haystack,
  $needle,
  $offset = 0 
)

Find position of first occurrence of a string, both arguments are in UTF-8.

Parameters
string$haystackUTF-8 string to search in
string$needleUTF-8 string to search for
int$offsetPosition to start the search
Returns
int The character position
See also
strpos()

Definition at line 1918 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\utf8_byte2char_pos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

◆ utf8_strrpos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strrpos (   $haystack,
  $needle 
)

Find position of last occurrence of a char in a string, both arguments are in UTF-8.

Parameters
string$haystackUTF-8 string to search in
string$needleUTF-8 character to search for (single character)
Returns
int The character position
See also
strrpos()

Definition at line 1946 of file CharsetConverter.php.

References $GLOBALS, and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_byte2char_pos().

◆ utf8_strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strtrunc (   $str,
  $len 
)

Truncates a string in UTF-8 short at a given byte length.

Parameters
string$strUTF-8 multibyte character string
int$lenThe byte length
Returns
string The shortened string
See also
mb_strcut()

Definition at line 1887 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strtrunc().

◆ utf8_substr()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_substr (   $str,
  $start,
  $len = null 
)

Returns a part of a UTF-8 string. Unit-tested by Kasper and works 100% like substr() / mb_substr() for full range of $start/$len

Parameters
string$strUTF-8 string
int$startStart position (character position)
int$lenLength (in characters)
Returns
string The substring
See also
substr()

Definition at line 1828 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ utf8_to_entities()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_entities (   $str)

Converts all chars > 127 to numeric entities.

Parameters
string$strInput string
Returns
string Output string

Definition at line 784 of file CharsetConverter.php.

References $a, TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

◆ utf8_to_numberarray()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_numberarray (   $str,
  $convEntities = false,
  $retChar = false 
)

Converts all chars in the input UTF-8 string into integer numbers returned in an array

Parameters
string$strInput string, UTF-8
bool$convEntitiesIf set, then all HTML entities (like & or £ or { or 㽝) will be detected as characters.
bool$retCharIf set, then instead of integer numbers the real UTF-8 char is returned.
Returns
array Output array with the char numbers

Definition at line 871 of file CharsetConverter.php.

References $a, TYPO3\CMS\Core\Charset\CharsetConverter\$noCharByteVal, TYPO3\CMS\Core\Charset\CharsetConverter\entities_to_utf8(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

◆ utf8CharToUnumber()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8CharToUnumber (   $str,
  $hex = false 
)

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

Parameters
string$strUTF-8 multibyte character string
bool$hexIf set, then a hex. number is returned.
Returns
int UNICODE integer
See also
UnumberToChar()

Definition at line 980 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_entities(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_numberarray().

Member Data Documentation

◆ $caseFolding

TYPO3\CMS\Core\Charset\CharsetConverter::$caseFolding = []

Definition at line 74 of file CharsetConverter.php.

◆ $charSetArray

TYPO3\CMS\Core\Charset\CharsetConverter::$charSetArray
Initial value:
= [
'af' => ''

Definition at line 450 of file CharsetConverter.php.

◆ $eucBasedSets

TYPO3\CMS\Core\Charset\CharsetConverter::$eucBasedSets
Initial value:
= [
'gb2312' => 1

Definition at line 107 of file CharsetConverter.php.

◆ $fourByteSets

TYPO3\CMS\Core\Charset\CharsetConverter::$fourByteSets
Initial value:
= [
'ucs-4' => 1

Definition at line 97 of file CharsetConverter.php.

◆ $lang_to_script

TYPO3\CMS\Core\Charset\CharsetConverter::$lang_to_script
Initial value:
= [
'af' => 'west_european'

Definition at line 210 of file CharsetConverter.php.

◆ $locale_to_charset

TYPO3\CMS\Core\Charset\CharsetConverter::$locale_to_charset
Initial value:
= [
'japanese.euc' => 'euc-jp'

Definition at line 434 of file CharsetConverter.php.

◆ $noCharByteVal

TYPO3\CMS\Core\Charset\CharsetConverter::$noCharByteVal = 63

◆ $parsedCharsets

TYPO3\CMS\Core\Charset\CharsetConverter::$parsedCharsets = []

Definition at line 67 of file CharsetConverter.php.

◆ $script_to_charset_unix

TYPO3\CMS\Core\Charset\CharsetConverter::$script_to_charset_unix
Initial value:
= [
'west_european' => 'iso-8859-1'

Definition at line 380 of file CharsetConverter.php.

◆ $script_to_charset_windows

TYPO3\CMS\Core\Charset\CharsetConverter::$script_to_charset_windows
Initial value:
= [
'east_european' => 'windows-1250'

Definition at line 407 of file CharsetConverter.php.

◆ $synonyms

TYPO3\CMS\Core\Charset\CharsetConverter::$synonyms
Initial value:
= [
'us' => 'ascii'

Definition at line 120 of file CharsetConverter.php.

◆ $toASCII

TYPO3\CMS\Core\Charset\CharsetConverter::$toASCII = []

Definition at line 81 of file CharsetConverter.php.

◆ $twoByteSets

TYPO3\CMS\Core\Charset\CharsetConverter::$twoByteSets
Initial value:
= [
'ucs-2' => 1
]

Definition at line 88 of file CharsetConverter.php.