TYPO3CMS  8
 All Classes Namespaces Files Functions Variables Pages
Lexer Class Reference

Public Member Functions

 __construct ()
 
 split2Words ($wordString)
 
 addWords (&$words, &$wordString, $start, $len)
 
 get_word (&$str, $pos=0)
 
 utf8_is_letter (&$str, &$len, $pos=0)
 
 charType ($cp)
 
 utf8_ord (&$str, &$len, $pos=0, $hex=false)
 

Public Attributes

 $debug = false
 
 $debugString = ''
 
 $csObj
 
 $lexerConf
 

Detailed Description

Lexer class for indexed_search A lexer splits the text into words

Definition at line 21 of file indexed_search/Classes/Lexer.php.

Constructor & Destructor Documentation

__construct ( )

Constructor: Initializes the charset class

Definition at line 61 of file indexed_search/Classes/Lexer.php.

References GeneralUtility\makeInstance().

Member Function Documentation

addWords ( $words,
$wordString,
  $start,
  $len 
)

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

Parameters
array$wordsArray of accumulated words
string$wordStringComplete Input string from where to extract word
int$startStart position of word in input string
int$lenThe Length of the word string from start position
Returns
void

Definition at line 117 of file indexed_search/Classes/Lexer.php.

References Lexer\charType(), and Lexer\utf8_ord().

Referenced by Lexer\split2Words().

charType (   $cp)

Determine the type of character

Parameters
int$cpUnicode number to evaluate
Returns
array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

Definition at line 264 of file indexed_search/Classes/Lexer.php.

Referenced by Lexer\addWords(), and Lexer\utf8_is_letter().

get_word ( $str,
  $pos = 0 
)

Get the first word in a given utf-8 string (initial non-letters will be skipped)

Parameters
string$strInput string (reference)
int$posStarting position in input string
Returns
array 0: start, 1: len or FALSE if no word has been found

Definition at line 164 of file indexed_search/Classes/Lexer.php.

References Lexer\utf8_is_letter().

Referenced by Lexer\split2Words().

split2Words (   $wordString)

Splitting string into words. Used for indexing, can also be used to find words in query.

Parameters
stringString with UTF-8 content to process.
Returns
array Array of words in utf-8

Definition at line 73 of file indexed_search/Classes/Lexer.php.

References Lexer\addWords(), debug(), and Lexer\get_word().

utf8_is_letter ( $str,
$len,
  $pos = 0 
)

See if a character is a letter (or a string of letters or non-letters).

Parameters
string$strInput string (reference)
int$lenByte-length of character sequence (reference, return value)
int$posStarting position in input string
Returns
bool letter (or word) found

Definition at line 189 of file indexed_search/Classes/Lexer.php.

References Lexer\charType(), and Lexer\utf8_ord().

Referenced by Lexer\get_word().

utf8_ord ( $str,
$len,
  $pos = 0,
  $hex = false 
)

Converts a UTF-8 multibyte character to a UNICODE codepoint

Parameters
string$strUTF-8 multibyte character string (reference)
int$lenThe length of the character (reference, return value)
int$posStarting position in input string
bool$hexIf set, then a hex. number is returned
Returns
int UNICODE codepoint

Definition at line 291 of file indexed_search/Classes/Lexer.php.

Referenced by Lexer\addWords(), and Lexer\utf8_is_letter().

Member Data Documentation

$csObj

Definition at line 42 of file indexed_search/Classes/Lexer.php.

$debug = false

Definition at line 28 of file indexed_search/Classes/Lexer.php.

$debugString = ''

Definition at line 35 of file indexed_search/Classes/Lexer.php.

$lexerConf
Initial value:
= [
'printjoins' => [46, 45, 95, 58, 47, 39]

Definition at line 49 of file indexed_search/Classes/Lexer.php.