‪TYPO3CMS  10.4
TYPO3\CMS\IndexedSearch\Lexer Class Reference

Public Member Functions

array split2Words ($wordString)
 
 addWords (&$words, &$wordString, $start, $len)
 
array get_word (&$str, $pos=0)
 
bool utf8_is_letter (&$str, &$len, $pos=0)
 
array charType ($cp)
 
int utf8_ord (&$str, &$len, $pos=0, $hex=false)
 

Public Attributes

bool $debug = false
 
string $debugString = ''
 
array $lexerConf
 

Detailed Description

Lexer class for indexed_search A lexer splits the text into words

Definition at line 26 of file Lexer.php.

Member Function Documentation

◆ addWords()

TYPO3\CMS\IndexedSearch\Lexer::addWords ( $words,
$wordString,
  $start,
  $len 
)

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

Parameters
array$words‪Array of accumulated words
string$wordString‪Complete Input string from where to extract word
int$start‪Start position of word in input string
int$len‪The Length of the word string from start position

Definition at line 105 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\charType(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_ord().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\split2Words().

◆ charType()

array TYPO3\CMS\IndexedSearch\Lexer::charType (   $cp)

Determine the type of character

Parameters
int$cp‪Unicode number to evaluate
Returns
‪array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

Definition at line 252 of file Lexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Lexer\addWords(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

◆ get_word()

array TYPO3\CMS\IndexedSearch\Lexer::get_word ( $str,
  $pos = 0 
)

Get the first word in a given utf-8 string (initial non-letters will be skipped)

Parameters
string$str‪Input string (reference)
int$pos‪Starting position in input string
Returns
‪array 0: start, 1: len or FALSE if no word has been found

Definition at line 153 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\split2Words().

◆ split2Words()

array TYPO3\CMS\IndexedSearch\Lexer::split2Words (   $wordString)

Splitting string into words. Used for indexing, can also be used to find words in query.

Parameters
string$wordString‪String with UTF-8 content to process.
Returns
‪array Array of words in utf-8

Definition at line 60 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\addWords(), debug(), and TYPO3\CMS\IndexedSearch\Lexer\get_word().

◆ utf8_is_letter()

bool TYPO3\CMS\IndexedSearch\Lexer::utf8_is_letter ( $str,
$len,
  $pos = 0 
)

See if a character is a letter (or a string of letters or non-letters).

Parameters
string$str‪Input string (reference)
int$len‪Byte-length of character sequence (reference, return value)
int$pos‪Starting position in input string
Returns
‪bool letter (or word) found

Definition at line 178 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\charType(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_ord().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\get_word().

◆ utf8_ord()

int TYPO3\CMS\IndexedSearch\Lexer::utf8_ord ( $str,
$len,
  $pos = 0,
  $hex = false 
)

Converts a UTF-8 multibyte character to a UNICODE codepoint

Parameters
string$str‪UTF-8 multibyte character string (reference)
int$len‪The length of the character (reference, return value)
int$pos‪Starting position in input string
bool$hex‪If set, then a hex. number is returned
Returns
‪int UNICODE codepoint

Definition at line 281 of file Lexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Lexer\addWords(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

Member Data Documentation

◆ $debug

bool TYPO3\CMS\IndexedSearch\Lexer::$debug = false

Debugging options:

Definition at line 33 of file Lexer.php.

◆ $debugString

string TYPO3\CMS\IndexedSearch\Lexer::$debugString = ''

If set, the debugString is filled with HTML output highlighting search / non-search words (for backend display)

Definition at line 39 of file Lexer.php.

◆ $lexerConf

array TYPO3\CMS\IndexedSearch\Lexer::$lexerConf
Initial value:
= array(
'printjoins' => [46, 45, 95, 58, 47, 39],
'casesensitive' => false,
'removeChars' => [45]
)

Configuration of the lexer:

Definition at line 45 of file Lexer.php.