CharsetConverter implements SingletonInterface

Class for conversion between charsets

Table of Contents

Interfaces

SingletonInterface
"empty" interface for singletons (marker interface pattern)

Properties

$eucBasedSets  : array<string|int, mixed>
This tells the converter which charsets use a scheme like the Extended Unix Code:
$noCharByteVal  : int
ASCII Value for chars with no equivalent.
$parsedCharsets  : array<string|int, mixed>
This is the array where parsed conversion tables are stored (cached)
$toASCII  : array<string|int, mixed>
An array where charset-to-ASCII mappings are stored (cached)
$twoByteSets  : array<string|int, mixed>
This tells the converter which charsets has two bytes per char:

Methods

conv()  : string
Convert from one charset to another charset.
euc_char_mapping()  : string
Maps all characters of a string in the EUC charset family.
sb_char_mapping()  : string
Maps all characters of a string in a single byte charset.
specCharsToASCII()  : string
Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)
UnumberToChar()  : string
Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper
utf8_char_mapping()  : string
Maps all characters of a UTF-8 string.
utf8_decode()  : string
Converts $str from UTF-8 to $charset
utf8_encode()  : string
Converts $str from $charset to UTF-8
utf8_to_numberarray()  : array<string|int, mixed>
Converts all chars in the input UTF-8 string into integer numbers returned in an array.
utf8CharToUnumber()  : int
Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper
initCharset()  : bool
This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions
initToASCII()  : bool
This function initializes the to-ASCII conversion table for a charset other than UTF-8.
initUnicodeData()  : bool
This function initializes all UTF-8 character data tables.

Properties

$eucBasedSets

This tells the converter which charsets use a scheme like the Extended Unix Code:

protected array<string|int, mixed> $eucBasedSets = [ 'gb2312' => 1, // Chinese, simplified. 'big5' => 1, // Chinese, traditional. 'euc-kr' => 1, // Korean 'shift_jis' => 1, ]

$noCharByteVal

ASCII Value for chars with no equivalent.

protected int $noCharByteVal = 63

$parsedCharsets

This is the array where parsed conversion tables are stored (cached)

protected array<string|int, mixed> $parsedCharsets = []

$toASCII

An array where charset-to-ASCII mappings are stored (cached)

protected array<string|int, mixed> $toASCII = []

$twoByteSets

This tells the converter which charsets has two bytes per char:

protected array<string|int, mixed> $twoByteSets = ['ucs-2' => 1]

Methods

conv()

Convert from one charset to another charset.

public conv(string $inputString, string $fromCharset, string $toCharset) : string
Parameters
$inputString : string

Input string

$fromCharset : string

From charset (the current charset of the string)

$toCharset : string

To charset (the output charset wanted)

Return values
string

Converted string

euc_char_mapping()

Maps all characters of a string in the EUC charset family.

public euc_char_mapping(string $str, string $charset) : string
Parameters
$str : string

EUC multibyte character string

$charset : string

The charset

Return values
string

The converted string

sb_char_mapping()

Maps all characters of a string in a single byte charset.

public sb_char_mapping(string $str, string $charset) : string
Parameters
$str : string

The string

$charset : string

The charset

Return values
string

The converted string

specCharsToASCII()

Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

public specCharsToASCII(string $charset, string $string) : string
Parameters
$charset : string

Character set of string

$string : string

Input string to convert

Return values
string

The converted string

UnumberToChar()

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

public UnumberToChar(int $unicodeInteger) : string

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation
    1 |    7 | 0vvvvvvv
    2 |   11 | 110vvvvv 10vvvvvv
    3 |   16 | 1110vvvv 10vvvvvv 10vvvvvv
    4 |   21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
    5 |   26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
    6 |   31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

Parameters
$unicodeInteger : int

UNICODE integer

Tags
see
utf8CharToUnumber()
Return values
string

UTF-8 multibyte character string

utf8_char_mapping()

Maps all characters of a UTF-8 string.

public utf8_char_mapping(string $str) : string
Parameters
$str : string

UTF-8 string

Return values
string

The converted string

utf8_decode()

Converts $str from UTF-8 to $charset

public utf8_decode(string $str, string $charset[, bool $useEntityForNoChar = false ]) : string
Parameters
$str : string

String in UTF-8 to convert to local charset

$charset : string

Charset, lowercase. Must be found in csconvtbl/ folder.

$useEntityForNoChar : bool = false

If set, then characters that are not available in the destination character set will be encoded as numeric entities

Return values
string

Output string, converted to local charset

utf8_encode()

Converts $str from $charset to UTF-8

public utf8_encode(string $str, string $charset) : string
Parameters
$str : string

String in local charset to convert to UTF-8

$charset : string

Charset, lowercase. Must be found in csconvtbl/ folder.

Return values
string

Output string, converted to UTF-8

utf8_to_numberarray()

Converts all chars in the input UTF-8 string into integer numbers returned in an array.

public utf8_to_numberarray(string $str) : array<string|int, mixed>

All HTML entities (like & or £ or { or 㽝) will be detected as characters. Also, instead of integer numbers the real UTF-8 char is returned.

Parameters
$str : string

Input string, UTF-8

Return values
array<string|int, mixed>

Output array with the char numbers

utf8CharToUnumber()

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

public utf8CharToUnumber(string $str[, bool $hex = false ]) : int
Parameters
$str : string

UTF-8 multibyte character string

$hex : bool = false

If set, then a hex. number is returned.

Tags
see
UnumberToChar()
Return values
int

UNICODE integer

initCharset()

This will initialize a charset for use if it's defined in the 'typo3/sysext/core/Resources/Private/Charsets/csconvtbl/' folder This function is automatically called by the conversion functions

protected initCharset(string $charset) : bool

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters
$charset : string

The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)

Tags
throws
UnknownCharsetException

if no charset table was found

Return values
bool

if the charset conversion table was found and parsed.

initToASCII()

This function initializes the to-ASCII conversion table for a charset other than UTF-8.

protected initToASCII(string $charset) : bool

This function is automatically called by the ASCII transliteration functions.

Parameters
$charset : string

Charset for which to initialize conversion.

Return values
bool

Returns FALSE on error, TRUE on success

initUnicodeData()

This function initializes all UTF-8 character data tables.

protected initUnicodeData() : bool

PLEASE SEE: http://www.unicode.org/Public/UNIDATA/

Return values
bool

Returns FALSE on error, TRUE value on success


        
On this page

Search results