TYPO3 CMS  TYPO3_6-2
TYPO3\CMS\Core\Charset\CharsetConverter Class Reference
Inheritance diagram for TYPO3\CMS\Core\Charset\CharsetConverter:
t3lib_cs

Public Member Functions

 __construct ()
 
 parse_charset ($charset)
 
 get_locale_charset ($locale)
 
 conv ($str, $fromCS, $toCS, $useEntityForNoChar=0)
 
 convArray (&$array, $fromCS, $toCS, $useEntityForNoChar=0)
 
 utf8_encode ($str, $charset)
 
 utf8_decode ($str, $charset, $useEntityForNoChar=0)
 
 utf8_to_entities ($str)
 
 entities_to_utf8 ($str, $alsoStdHtmlEnt=FALSE)
 
 utf8_to_numberarray ($str, $convEntities=0, $retChar=0)
 
 UnumberToChar ($cbyte)
 
 utf8CharToUnumber ($str, $hex=0)
 
 initCharset ($charset)
 
 initUnicodeData ($mode=NULL)
 
 initCaseFolding ($charset)
 
 initToASCII ($charset)
 
 substr ($charset, $string, $start, $len=NULL)
 
 strlen ($charset, $string)
 
 crop ($charset, $string, $len, $crop='')
 
 strtrunc ($charset, $string, $len)
 
 conv_case ($charset, $string, $case)
 
 convCaseFirst ($charset, $string, $case)
 
 specCharsToASCII ($charset, $string)
 
 getPreferredClientLanguage ($languageCodesList)
 
 sb_char_mapping ($str, $charset, $mode, $opt='')
 
 utf8_substr ($str, $start, $len=NULL)
 
 utf8_strlen ($str)
 
 utf8_strtrunc ($str, $len)
 
 utf8_strpos ($haystack, $needle, $offset=0)
 
 utf8_strrpos ($haystack, $needle)
 
 utf8_char2byte_pos ($str, $pos)
 
 utf8_byte2char_pos ($str, $pos)
 
 utf8_char_mapping ($str, $mode, $opt='')
 
 euc_strtrunc ($str, $len, $charset)
 
 euc_substr ($str, $start, $charset, $len=NULL)
 
 euc_strlen ($str, $charset)
 
 euc_char2byte_pos ($str, $pos, $charset)
 
 euc_char_mapping ($str, $charset, $mode, $opt='')
 

Public Attributes

 $noCharByteVal = 63
 
 $parsedCharsets = array()
 
 $caseFolding = array()
 
 $toASCII = array()
 
 $twoByteSets
 
 $fourByteSets
 
 $eucBasedSets
 
 $synonyms
 
 $lang_to_script
 
 $script_to_charset_unix
 
 $script_to_charset_windows
 
 $locale_to_charset
 
 $charSetArray
 

Protected Member Functions

 cropMbstring ($charset, $string, $len, $crop='')
 

Protected Attributes

 $locales
 

Detailed Description

Notes on UTF-8

Functions working on UTF-8 strings:

  • strchr/strstr
  • strrchr
  • substr_count
  • implode/explode/join

Functions nearly working on UTF-8 strings:

  • strlen: returns the length in BYTES, if you need the length in CHARACTERS use utf8_strlen
  • trim/ltrim/rtrim: the second parameter 'charlist' won't work for characters not contained in 7-bit ASCII
  • strpos/strrpos: they return the BYTE position, if you need the CHARACTER position use utf8_strpos/utf8_strrpos
  • htmlentities: charset support for UTF-8 only since PHP 4.3.0
  • preg_*: Support compiled into PHP by default nowadays, but could be unavailable, need to use modifier

Functions NOT working on UTF-8 strings:

  • str*cmp
  • stristr
  • stripos
  • substr
  • strrev
  • split/spliti
  • ... Class for conversion between charsets
Author
Kasper Skårhøj kaspe.nosp@m.rYYY.nosp@m.Y@typ.nosp@m.o3.c.nosp@m.om
Martin Kutschker marti.nosp@m.n.t..nosp@m.kutsc.nosp@m.hker.nosp@m.@blac.nosp@m.kbox.nosp@m..net

Definition at line 54 of file CharsetConverter.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\Core\Charset\CharsetConverter::__construct ( )

Default constructor.

Definition at line 610 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Member Function Documentation

◆ conv()

TYPO3\CMS\Core\Charset\CharsetConverter::conv (   $str,
  $fromCS,
  $toCS,
  $useEntityForNoChar = 0 
)

Convert from one charset to another charset.

Parameters
string$strInput string
string$fromCSFrom charset (the current charset of the string)
string$toCSTo charset (the output charset wanted)
boolean$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
string Converted string
See also
convArray()
Todo:
Define visibility

Definition at line 687 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\convArray().

◆ conv_case()

TYPO3\CMS\Core\Charset\CharsetConverter::conv_case (   $charset,
  $string,
  $case 
)

Translates all characters of a string into their respective case values. Unlike strtolower() and strtoupper() this method is locale independent. Note that the string length may change! eg. lower case German "ß" (sharp S) becomes upper case "SS" Unit-tested by Kasper Real case folding is language dependent, this method ignores this fact.

Parameters
string$charsetCharacter set of string
string$stringInput string to convert case for
string$caseCase keyword: "toLower" means lowercase conversion, anything else is uppercase (use "toUpper" )
Returns
string The converted string
See also
strtolower(), strtoupper()
Todo:
Define visibility

Definition at line 1741 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\convCaseFirst().

◆ convArray()

TYPO3\CMS\Core\Charset\CharsetConverter::convArray ( $array,
  $fromCS,
  $toCS,
  $useEntityForNoChar = 0 
)

Convert all elements in ARRAY with type string from one charset to another charset. NOTICE: Array is passed by reference!

Parameters
string$arrayInput array, possibly multidimensional
string$fromCSFrom charset (the current charset of the string)
string$toCSTo charset (the output charset wanted)
boolean$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
void
See also
conv()
Todo:
Define visibility

Definition at line 736 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ convCaseFirst()

TYPO3\CMS\Core\Charset\CharsetConverter::convCaseFirst (   $charset,
  $string,
  $case 
)

Equivalent of lcfirst/ucfirst but using character set.

Parameters
string$charset
string$string
string$case
Returns
string
See also
::conv_case()

Definition at line 1768 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ crop()

TYPO3\CMS\Core\Charset\CharsetConverter::crop (   $charset,
  $string,
  $len,
  $crop = '' 
)

Truncates a string and pre-/appends a string. Unit tested by Kasper

Parameters
string$charsetThe character set
string$stringCharacter string
integer$lenLength (in characters)
string$cropCrop signifier
Returns
string The shortened string
See also
substr(), mb_strimwidth()
Todo:
Define visibility

Definition at line 1655 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\cropMbstring(), TYPO3\CMS\Core\Charset\CharsetConverter\euc_char2byte_pos(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

◆ cropMbstring()

TYPO3\CMS\Core\Charset\CharsetConverter::cropMbstring (   $charset,
  $string,
  $len,
  $crop = '' 
)
protected

Method to crop strings using the mb_substr function.

Parameters
string$charsetThe character set
string$stringString to be cropped
integer$lenCrop length (in characters)
string$cropCrop signifier
Returns
string The shortened string
See also
mb_strlen(), mb_substr()

Definition at line 1631 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop().

◆ entities_to_utf8()

TYPO3\CMS\Core\Charset\CharsetConverter::entities_to_utf8 (   $str,
  $alsoStdHtmlEnt = FALSE 
)

Converts numeric entities (UNICODE, eg. decimal (Ӓ) or hexadecimal ()) to UTF-8 multibyte chars

Parameters
string$strInput string, UTF-8
boolean$alsoStdHtmlEntIf set, then all string-HTML entities (like & or £ will be converted as well)
Returns
string Output string
Todo:
Define visibility

Definition at line 927 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_numberarray().

◆ euc_char2byte_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_char2byte_pos (   $str,
  $pos,
  $charset 
)

Translates a character position into an 'absolute' byte position.

Parameters
string$strEUC multibyte character string
integer$posCharacter position (negative values start from the end)
string$charsetThe charset
Returns
integer Byte position
Todo:
Define visibility

Definition at line 2283 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop(), and TYPO3\CMS\Core\Charset\CharsetConverter\euc_substr().

◆ euc_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_char_mapping (   $str,
  $charset,
  $mode,
  $opt = '' 
)

Maps all characters of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
string$charsetThe charset
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string
Todo:
Define visibility

Definition at line 2330 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ euc_strlen()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_strlen (   $str,
  $charset 
)

Counts the number of characters of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
string$charsetThe charset
Returns
integer The number of characters
See also
strlen()
Todo:
Define visibility

Definition at line 2255 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

◆ euc_strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_strtrunc (   $str,
  $len,
  $charset 
)

Cuts a string in the EUC charset family short at a given byte length.

Parameters
string$strEUC multibyte character string
integer$lenThe byte length
string$charsetThe charset
Returns
string The shortened string
See also
mb_strcut()
Todo:
Define visibility

Definition at line 2190 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strtrunc().

◆ euc_substr()

TYPO3\CMS\Core\Charset\CharsetConverter::euc_substr (   $str,
  $start,
  $charset,
  $len = NULL 
)

Returns a part of a string in the EUC charset family.

Parameters
string$strEUC multibyte character string
integer$startStart position (character position)
string$charsetThe charset
integer$lenLength (in characters)
Returns
string the substring
Todo:
Define visibility

Definition at line 2226 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char2byte_pos(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ get_locale_charset()

TYPO3\CMS\Core\Charset\CharsetConverter::get_locale_charset (   $locale)

Get the charset of a locale.

ln language ln_CN language / country ln_CN.cs language / country / charset ln_CN.cs language / country / charset / modifier

Parameters
string$localeLocale string
Returns
string Charset resolved for locale string
Todo:
Define visibility

Definition at line 641 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\parse_charset().

◆ getPreferredClientLanguage()

TYPO3\CMS\Core\Charset\CharsetConverter::getPreferredClientLanguage (   $languageCodesList)

Converts the language codes that we get from the client (usually HTTP_ACCEPT_LANGUAGE) into a TYPO3-readable language code

Parameters
string$languageCodesListList of language codes. something like 'de,en-us;q=0.9,de-de;q=0.7,es-cl;q=0.6,en;q=0.4,es;q=0.3,zh;q=0.1'
Returns
string A preferred language that TYPO3 supports, or "default" if none found

Definition at line 1802 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ initCaseFolding()

TYPO3\CMS\Core\Charset\CharsetConverter::initCaseFolding (   $charset)

This function initializes the folding table for a charset other than UTF-8. This function is automatically called by the case folding functions.

Parameters
string$charsetCharset for which to initialize case folding.
Returns
integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private
Todo:
Define visibility

Definition at line 1438 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initCharset()

TYPO3\CMS\Core\Charset\CharsetConverter::initCharset (   $charset)

This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters
stringThe charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Returns
integer Returns '1' if already loaded. Returns FALSE if charset conversion table was not found. Returns '2' if the charset conversion table was found and parsed. private
Todo:
Define visibility

Definition at line 1130 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\ExtensionManagementUtility\extPath(), TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode(), TYPO3\CMS\Core\Charset\CharsetConverter\UnumberToChar(), TYPO3\CMS\Core\Utility\GeneralUtility\validPathStr(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_encode().

◆ initToASCII()

TYPO3\CMS\Core\Charset\CharsetConverter::initToASCII (   $charset)

This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions.

Parameters
string$charsetCharset for which to initialize conversion.
Returns
integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private
Todo:
Define visibility

Definition at line 1500 of file CharsetConverter.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), and TYPO3\CMS\Core\Utility\GeneralUtility\writeFileToTypo3tempDir().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping().

◆ initUnicodeData()

TYPO3\CMS\Core\Charset\CharsetConverter::initUnicodeData (   $mode = NULL)

◆ parse_charset()

TYPO3\CMS\Core\Charset\CharsetConverter::parse_charset (   $charset)

Normalize - changes input character set to lowercase letters.

Parameters
string$charsetInput charset
Returns
string Normalized charset
Todo:
Define visibility

Definition at line 621 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\get_locale_charset().

◆ sb_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::sb_char_mapping (   $str,
  $charset,
  $mode,
  $opt = '' 
)

Maps all characters of a string in a single byte charset.

Parameters
string$strThe string
string$charsetThe charset
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string
Todo:
Define visibility

Definition at line 1863 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII(), and TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ specCharsToASCII()

TYPO3\CMS\Core\Charset\CharsetConverter::specCharsToASCII (   $charset,
  $string 
)

Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

Parameters
string$charsetCharacter set of string
string$stringInput string to convert
Returns
string The converted string
Todo:
Define visibility

Definition at line 1783 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\euc_char_mapping(), TYPO3\CMS\Core\Charset\CharsetConverter\sb_char_mapping(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char_mapping().

◆ strlen()

◆ strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::strtrunc (   $charset,
  $string,
  $len 
)

Cuts a string short at a given byte length.

Parameters
string$charsetThe character set
string$stringCharacter string
integer$lenThe byte length
Returns
string The shortened string
See also
mb_strcut()
Todo:
Define visibility

Definition at line 1703 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\euc_strtrunc(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strtrunc().

◆ substr()

TYPO3\CMS\Core\Charset\CharsetConverter::substr (   $charset,
  $string,
  $start,
  $len = NULL 
)

◆ UnumberToChar()

TYPO3\CMS\Core\Charset\CharsetConverter::UnumberToChar (   $cbyte)

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation 1 | 7 | 0vvvvvvv 2 | 11 | 110vvvvv 10vvvvvv 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

Parameters
integer$cbyteUNICODE integer
Returns
string UTF-8 multibyte character string
See also
utf8CharToUnumber()
Todo:
Define visibility

Definition at line 1033 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\entities_to_utf8(), TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), and TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData().

◆ utf8_byte2char_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_byte2char_pos (   $str,
  $pos 
)

Translates an 'absolute' byte position into a character position. Unit tested by Kasper.

Parameters
string$strUTF-8 string
integer$posByte position
Returns
integer Character position
Todo:
Define visibility

Definition at line 2101 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strpos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strrpos().

◆ utf8_char2byte_pos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char2byte_pos (   $str,
  $pos 
)

Translates a character position into an 'absolute' byte position. Unit tested by Kasper.

Parameters
string$strUTF-8 string
integer$posCharacter position (negative values start from the end)
Returns
integer Byte position
Todo:
Define visibility

Definition at line 2054 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\crop(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_strpos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_substr().

◆ utf8_char_mapping()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_char_mapping (   $str,
  $mode,
  $opt = '' 
)

Maps all characters of an UTF-8 string.

Parameters
string$strUTF-8 string
string$modeMode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string$opt'case': conversion 'toLower' or 'toUpper'
Returns
string The converted string
Todo:
Define visibility

Definition at line 2130 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initUnicodeData(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv_case(), and TYPO3\CMS\Core\Charset\CharsetConverter\specCharsToASCII().

◆ utf8_decode()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_decode (   $str,
  $charset,
  $useEntityForNoChar = 0 
)

Converts $str from UTF-8 to $charset

Parameters
string$strString in UTF-8 to convert to local charset
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
boolean$useEntityForNoCharIf set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns
string Output string, converted to local charset
Todo:
Define visibility

Definition at line 814 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv(), TYPO3\CMS\Core\Charset\CharsetConverter\initCaseFolding(), and TYPO3\CMS\Core\Charset\CharsetConverter\initToASCII().

◆ utf8_encode()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_encode (   $str,
  $charset 
)

Converts $str from $charset to UTF-8

Parameters
string$strString in local charset to convert to UTF-8
string$charsetCharset, lowercase. Must be found in csconvtbl/ folder.
Returns
string Output string, converted to UTF-8
Todo:
Define visibility

Definition at line 754 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\initCharset(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), and TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\conv().

◆ utf8_strlen()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strlen (   $str)

Counts the number of characters of a string in UTF-8. Unit-tested by Kasper and works 100% like strlen() / mb_strlen()

Parameters
string$strUTF-8 multibyte character string
Returns
integer The number of characters
See also
strlen()
Todo:
Define visibility

Definition at line 1947 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strlen().

◆ utf8_strpos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strpos (   $haystack,
  $needle,
  $offset = 0 
)

Find position of first occurrence of a string, both arguments are in UTF-8.

Parameters
string$haystackUTF-8 string to search in
string$needleUTF-8 string to search for
integer$offsetPositition to start the search
Returns
integer The character position
See also
strpos()
Todo:
Define visibility

Definition at line 2003 of file CharsetConverter.php.

References $GLOBALS, TYPO3\CMS\Core\Charset\CharsetConverter\utf8_byte2char_pos(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

◆ utf8_strrpos()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strrpos (   $haystack,
  $needle 
)

Find position of last occurrence of a char in a string, both arguments are in UTF-8.

Parameters
string$haystackUTF-8 string to search in
string$needleUTF-8 character to search for (single character)
Returns
integer The character position
See also
strrpos()
Todo:
Define visibility

Definition at line 2031 of file CharsetConverter.php.

References $GLOBALS, and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_byte2char_pos().

◆ utf8_strtrunc()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_strtrunc (   $str,
  $len 
)

Truncates a string in UTF-8 short at a given byte length.

Parameters
string$strUTF-8 multibyte character string
integer$lenThe byte length
Returns
string The shortened string
See also
mb_strcut()
Todo:
Define visibility

Definition at line 1971 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\strtrunc().

◆ utf8_substr()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_substr (   $str,
  $start,
  $len = NULL 
)

Returns a part of a UTF-8 string. Unit-tested by Kasper and works 100% like substr() / mb_substr() for full range of $start/$len

Parameters
string$strUTF-8 string
integer$startStart position (character position)
integer$lenLength (in characters)
Returns
string The substring
See also
substr()
Todo:
Define visibility

Definition at line 1910 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_char2byte_pos().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\substr().

◆ utf8_to_entities()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_entities (   $str)

Converts all chars > 127 to numeric entities.

Parameters
string$strInput string
Returns
string Output string
Todo:
Define visibility

Definition at line 881 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

◆ utf8_to_numberarray()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8_to_numberarray (   $str,
  $convEntities = 0,
  $retChar = 0 
)

Converts all chars in the input UTF-8 string into integer numbers returned in an array

Parameters
string$strInput string, UTF-8
boolean$convEntitiesIf set, then all HTML entities (like & or £ or { or 㽝) will be detected as characters.
boolean$retCharIf set, then instead of integer numbers the real UTF-8 char is returned.
Returns
array Output array with the char numbers
Todo:
Define visibility

Definition at line 969 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\$noCharByteVal, TYPO3\CMS\Core\Charset\CharsetConverter\entities_to_utf8(), TYPO3\CMS\Core\Charset\CharsetConverter\strlen(), TYPO3\CMS\Core\Charset\CharsetConverter\substr(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8CharToUnumber().

◆ utf8CharToUnumber()

TYPO3\CMS\Core\Charset\CharsetConverter::utf8CharToUnumber (   $str,
  $hex = 0 
)

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

Parameters
string$strUTF-8 multibyte character string
boolean$hexIf set, then a hex. number is returned.
Returns
integer UNICODE integer
See also
UnumberToChar()
Todo:
Define visibility

Definition at line 1089 of file CharsetConverter.php.

References TYPO3\CMS\Core\Charset\CharsetConverter\substr().

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_decode(), TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_entities(), and TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_numberarray().

Member Data Documentation

◆ $caseFolding

TYPO3\CMS\Core\Charset\CharsetConverter::$caseFolding = array()
Todo:
Define visibility

Definition at line 77 of file CharsetConverter.php.

◆ $charSetArray

TYPO3\CMS\Core\Charset\CharsetConverter::$charSetArray
Todo:
Define visibility

Definition at line 538 of file CharsetConverter.php.

◆ $eucBasedSets

TYPO3\CMS\Core\Charset\CharsetConverter::$eucBasedSets
Initial value:
= array(
'gb2312' => 1,
'big5' => 1,
'euc-kr' => 1,
'shift_jis' => 1
)
Todo:
Define visibility

Definition at line 107 of file CharsetConverter.php.

◆ $fourByteSets

TYPO3\CMS\Core\Charset\CharsetConverter::$fourByteSets
Initial value:
= array(
'ucs-4' => 1,
'utf-32' => 1
)
Todo:
Define visibility

Definition at line 97 of file CharsetConverter.php.

◆ $lang_to_script

TYPO3\CMS\Core\Charset\CharsetConverter::$lang_to_script
Todo:
Define visibility

Definition at line 212 of file CharsetConverter.php.

◆ $locale_to_charset

TYPO3\CMS\Core\Charset\CharsetConverter::$locale_to_charset
Initial value:
= array(
'japanese.euc' => 'euc-jp',
'ja_jp.ujis' => 'euc-jp',
'korean.euc' => 'euc-kr',
'sr@Latn' => 'iso-8859-2',
'zh_cn' => 'gb2312',
'zh_hk' => 'big5',
'zh_tw' => 'big5'
)
Todo:
Define visibility

Definition at line 523 of file CharsetConverter.php.

◆ $locales

TYPO3\CMS\Core\Charset\CharsetConverter::$locales
protected

Definition at line 59 of file CharsetConverter.php.

◆ $noCharByteVal

TYPO3\CMS\Core\Charset\CharsetConverter::$noCharByteVal = 63
Todo:
Define visibility

Definition at line 65 of file CharsetConverter.php.

Referenced by TYPO3\CMS\Core\Charset\CharsetConverter\utf8_to_numberarray().

◆ $parsedCharsets

TYPO3\CMS\Core\Charset\CharsetConverter::$parsedCharsets = array()
Todo:
Define visibility

Definition at line 71 of file CharsetConverter.php.

◆ $script_to_charset_unix

TYPO3\CMS\Core\Charset\CharsetConverter::$script_to_charset_unix
Initial value:
= array(
'west_european' => 'iso-8859-1',
'estonian' => 'iso-8859-1',
'east_european' => 'iso-8859-2',
'baltic' => 'iso-8859-4',
'cyrillic' => 'iso-8859-5',
'arabic' => 'iso-8859-6',
'greek' => 'iso-8859-7',
'hebrew' => 'iso-8859-8',
'turkish' => 'iso-8859-9',
'thai' => 'iso-8859-11',
'lithuanian' => 'iso-8859-13',
'chinese' => 'gb2312',
'japanese' => 'euc-jp',
'korean' => 'euc-kr',
'simpl_chinese' => 'gb2312',
'trad_chinese' => 'big5',
'vietnamese' => '',
'unicode' => 'utf-8',
'albanian' => 'utf-8'
)
Todo:
Define visibility

Definition at line 469 of file CharsetConverter.php.

◆ $script_to_charset_windows

TYPO3\CMS\Core\Charset\CharsetConverter::$script_to_charset_windows
Initial value:
= array(
'east_european' => 'windows-1250',
'cyrillic' => 'windows-1251',
'west_european' => 'windows-1252',
'greek' => 'windows-1253',
'turkish' => 'windows-1254',
'hebrew' => 'windows-1255',
'arabic' => 'windows-1256',
'baltic' => 'windows-1257',
'estonian' => 'windows-1257',
'lithuanian' => 'windows-1257',
'vietnamese' => 'windows-1258',
'thai' => 'cp874',
'korean' => 'cp949',
'chinese' => 'gb2312',
'japanese' => 'shift_jis',
'simpl_chinese' => 'gb2312',
'trad_chinese' => 'big5',
'albanian' => 'windows-1250',
'unicode' => 'utf-8'
)
Todo:
Define visibility

Definition at line 497 of file CharsetConverter.php.

◆ $synonyms

TYPO3\CMS\Core\Charset\CharsetConverter::$synonyms
Todo:
Define visibility

Definition at line 122 of file CharsetConverter.php.

◆ $toASCII

TYPO3\CMS\Core\Charset\CharsetConverter::$toASCII = array()
Todo:
Define visibility

Definition at line 83 of file CharsetConverter.php.

◆ $twoByteSets

TYPO3\CMS\Core\Charset\CharsetConverter::$twoByteSets
Initial value:
= array(
'ucs-2' => 1
)
Todo:
Define visibility

Definition at line 89 of file CharsetConverter.php.