TYPO3 CMS  TYPO3_7-6
TYPO3\CMS\IndexedSearch\Lexer Class Reference

Public Member Functions

 __construct ()
 
 split2Words ($wordString)
 
 addWords (&$words, &$wordString, $start, $len)
 
 get_word (&$str, $pos=0)
 
 utf8_is_letter (&$str, &$len, $pos=0)
 
 charType ($cp)
 
 utf8_ord (&$str, &$len, $pos=0, $hex=false)
 

Public Attributes

 $debug = false
 
 $debugString = ''
 
 $csObj
 
 $lexerConf
 

Detailed Description

Lexer class for indexed_search A lexer splits the text into words

Definition at line 21 of file Lexer.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\Lexer::__construct ( )

Constructor: Initializes the charset class

Definition at line 61 of file Lexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Member Function Documentation

◆ addWords()

TYPO3\CMS\IndexedSearch\Lexer::addWords ( $words,
$wordString,
  $start,
  $len 
)

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

Parameters
array$wordsArray of accumulated words
string$wordStringComplete Input string from where to extract word
int$startStart position of word in input string
int$lenThe Length of the word string from start position
Returns
void

Definition at line 117 of file Lexer.php.

References $a, TYPO3\CMS\IndexedSearch\Lexer\charType(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_ord().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\split2Words().

◆ charType()

TYPO3\CMS\IndexedSearch\Lexer::charType (   $cp)

Determine the type of character

Parameters
int$cpUnicode number to evaluate
Returns
array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

Definition at line 264 of file Lexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Lexer\addWords(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

◆ get_word()

TYPO3\CMS\IndexedSearch\Lexer::get_word ( $str,
  $pos = 0 
)

Get the first word in a given utf-8 string (initial non-letters will be skipped)

Parameters
string$strInput string (reference)
int$posStarting position in input string
Returns
array 0: start, 1: len or FALSE if no word has been found

Definition at line 164 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\split2Words().

◆ split2Words()

TYPO3\CMS\IndexedSearch\Lexer::split2Words (   $wordString)

Splitting string into words. Used for indexing, can also be used to find words in query.

Parameters
stringString with UTF-8 content to process.
Returns
array Array of words in utf-8

Definition at line 73 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\addWords(), debug(), and TYPO3\CMS\IndexedSearch\Lexer\get_word().

◆ utf8_is_letter()

TYPO3\CMS\IndexedSearch\Lexer::utf8_is_letter ( $str,
$len,
  $pos = 0 
)

See if a character is a letter (or a string of letters or non-letters).

Parameters
string$strInput string (reference)
int$lenByte-length of character sequence (reference, return value)
int$posStarting position in input string
Returns
bool letter (or word) found

Definition at line 189 of file Lexer.php.

References TYPO3\CMS\IndexedSearch\Lexer\charType(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_ord().

Referenced by TYPO3\CMS\IndexedSearch\Lexer\get_word().

◆ utf8_ord()

TYPO3\CMS\IndexedSearch\Lexer::utf8_ord ( $str,
$len,
  $pos = 0,
  $hex = false 
)

Converts a UTF-8 multibyte character to a UNICODE codepoint

Parameters
string$strUTF-8 multibyte character string (reference)
int$lenThe length of the character (reference, return value)
int$posStarting position in input string
bool$hexIf set, then a hex. number is returned
Returns
int UNICODE codepoint

Definition at line 291 of file Lexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Lexer\addWords(), and TYPO3\CMS\IndexedSearch\Lexer\utf8_is_letter().

Member Data Documentation

◆ $csObj

TYPO3\CMS\IndexedSearch\Lexer::$csObj

Definition at line 42 of file Lexer.php.

◆ $debug

TYPO3\CMS\IndexedSearch\Lexer::$debug = false

Definition at line 28 of file Lexer.php.

◆ $debugString

TYPO3\CMS\IndexedSearch\Lexer::$debugString = ''

Definition at line 35 of file Lexer.php.

◆ $lexerConf

TYPO3\CMS\IndexedSearch\Lexer::$lexerConf
Initial value:
= [
'printjoins' => [46, 45, 95, 58, 47, 39]

Definition at line 49 of file Lexer.php.