‪TYPO3CMS  9.5
TYPO3\CMS\IndexedSearch\Indexer Class Reference
Inheritance diagram for TYPO3\CMS\IndexedSearch\Indexer:
TYPO3\CMS\Core\Compatibility\PublicPropertyDeprecationTrait

Public Member Functions

 __construct ()
 
 hook_indexContent (&$pObj)
 
 backend_initIndexer ($id, $type, $sys_language_uid, $MP, $uidRL, $cHash_array=[], $createCHash=false)
 
 backend_setFreeIndexUid ($freeIndexUid, $freeIndexSetId=0)
 
 backend_indexAsTYPO3Page ($title, $keywords, $description, $content, $charset, $mtime, $crdate=0, $recordUid=0)
 
 init ()
 
 initializeExternalParsers ()
 
 indexTypo3PageContent ()
 
array splitHTMLContent ($content)
 
string getHTMLcharset ($content)
 
string convertHTMLToUtf8 ($content, $charset='')
 
bool embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList)
 
bool typoSearchTags (&$body)
 
 extractLinks ($content)
 
array extractHyperLinks ($html)
 
string extractBaseHref ($html)
 
 indexExternalUrl ($externalUrl)
 
mixed getUrlHeaders ($url)
 
 indexRegularDocument ($file, $force=false, $contentTmpFile='', $altExtension='')
 
array readFileContent ($fileExtension, $absoluteFileName, $sectionPointer)
 
array fileContentParts ($ext, $absFile)
 
array splitRegularContent ($content)
 
 charsetEntity2utf8 (&$contentArr, $charset)
 
array processWordsInArrays ($contentArr)
 
string bodyDescription ($contentArr)
 
array indexAnalyze ($content)
 
 analyzeHeaderinfo (&$retArr, $content, $key, $offset)
 
 analyzeBody (&$retArr, $content)
 
mixed metaphone ($word, $returnRawMetaphoneValue=false)
 
 submitPage ()
 
 submit_grlist ($hash, $phash_x)
 
 submit_section ($hash, $hash_t3)
 
 removeOldIndexedPages ($phash)
 
 submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts)
 
 submitFile_grlist ($hash)
 
 submitFile_section ($hash)
 
 removeOldIndexedFiles ($phash)
 
int checkMtimeTstamp ($mtime, $phash)
 
mixed checkContentHash ()
 
bool checkExternalDocContentHash ($hashGr, $content_md5h)
 
bool is_grlist_set ($phash_x)
 
 update_grlist ($phash, $phash_x)
 
 updateTstamp ($phash, $mtime=0)
 
 updateSetId ($phash)
 
 updateParsetime ($phash, $parsetime)
 
 updateRootline ()
 
 getRootLineFields (array &$fieldArray)
 
 checkWordList ($wordListArray)
 
 submitWords ($wordList, $phash)
 
int freqMap ($freq)
 
 setT3Hashes ()
 
array setExtHashes ($file, $subinfo=[])
 
 log_push ($msg, $key)
 
 log_pull ()
 
 log_setTSlogMessage ($msg, $errorNum=0)
 
- ‪Public Member Functions inherited from ‪TYPO3\CMS\Core\Compatibility\PublicPropertyDeprecationTrait
bool __isset (string $propertyName)
 
mixed __get (string $propertyName)
 
 __set (string $propertyName, $propertyValue)
 
 __unset (string $propertyName)
 

Public Attributes

array $reasons
 
string $excludeSections = 'script,style'
 
array $external_parsers = array( )
 
string $defaultGrList = '0,-1'
 
int $tstamp_maxAge = 0
 
int $tstamp_minAge = 0
 
int $maxExternalFiles = 0
 
bool $forceIndexing = false
 
bool $crawlerActive = false
 
array $defaultContentArray
 
int $wordcount = 0
 
int $externalFileCounter = 0
 
array $conf = array( )
 
array $indexerConfig = array( )
 
array $hash = array( )
 
array $file_phash_arr = array( )
 
array $contentParts = array( )
 
string $content_md5h = ''
 
array $internal_log = array( )
 
string $indexExternalUrl_content = ''
 
array $cHashParams = array( )
 
int $freqRange = 32000
 
float $freqMax = 0.1
 
bool $enableMetaphoneSearch = false
 
bool $storeMetaphoneInfoAsWords
 
string $metaphoneContent = ''
 
TYPO3 CMS IndexedSearch Utility DoubleMetaPhoneUtility $metaphoneObj
 
TYPO3 CMS IndexedSearch Lexer $lexerObj
 
bool $flagBitMask
 

Protected Member Functions

string createLocalPath ($sourcePath)
 
string createLocalPathFromT3vars ($sourcePath)
 
string createLocalPathUsingDomainURL ($sourcePath)
 
string createLocalPathUsingAbsRefPrefix ($sourcePath)
 
string createLocalPathFromAbsoluteURL ($sourcePath)
 
string createLocalPathFromRelativeURL ($sourcePath)
 
string addSpacesToKeywordList ($keywordList)
 

Static Protected Member Functions

static bool isRelativeURL ($url)
 
static bool isAllowedLocalFile ($filePath)
 

Protected Attributes

array $deprecatedPublicProperties
 
TYPO3 CMS Core Charset CharsetConverter $csObj
 
TimeTracker $timeTracker
 

Detailed Description

Indexing class for TYPO3 frontend

Definition at line 37 of file Indexer.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\Indexer::__construct ( )

Indexer constructor.

Definition at line 230 of file Indexer.php.

Member Function Documentation

◆ addSpacesToKeywordList()

string TYPO3\CMS\IndexedSearch\Indexer::addSpacesToKeywordList (   $keywordList)
protected

Makes sure that keywords are space-separated. This is impotant for their proper displaying as a part of fulltext index.

Parameters
string$keywordList
Returns
‪string
See also
http://forge.typo3.org/issues/14959

Definition at line 2305 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ analyzeBody()

TYPO3\CMS\IndexedSearch\Indexer::analyzeBody ( $retArr,
  $content 
)

Calculates relevant information for bodycontent

Parameters
array$retArr‪Index array, passed by reference
array$content‪Standard content array

Definition at line 1379 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\metaphone().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze().

◆ analyzeHeaderinfo()

TYPO3\CMS\IndexedSearch\Indexer::analyzeHeaderinfo ( $retArr,
  $content,
  $key,
  $offset 
)

Calculates relevant information for headercontent

Parameters
array$retArr‪Index array, passed by reference
array$content‪Standard content array
string$key‪Key from standard content array
int$offset‪Bit-wise priority to type

Definition at line 1349 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\metaphone().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze().

◆ backend_indexAsTYPO3Page()

TYPO3\CMS\IndexedSearch\Indexer::backend_indexAsTYPO3Page (   $title,
  $keywords,
  $description,
  $content,
  $charset,
  $mtime,
  $crdate = 0,
  $recordUid = 0 
)

Indexing records as the content of a TYPO3 page.

Parameters
string$title‪Title equivalent
string$keywords‪Keywords equivalent
string$description‪Description equivalent
string$content‪The main content to index
string$charset‪The charset of the title, keyword, description and body-content. MUST BE VALID, otherwise nothing is indexed!
int$mtime‪Last modification time, in seconds
int$crdate‪The creation date of the content, in seconds
int$recordUid‪The record UID that the content comes from (for registration with the indexed rows)

Definition at line 427 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ backend_initIndexer()

TYPO3\CMS\IndexedSearch\Indexer::backend_initIndexer (   $id,
  $type,
  $sys_language_uid,
  $MP,
  $uidRL,
  $cHash_array = [],
  $createCHash = false 
)

Initializing the "combined ID" of the page (phash) being indexed (or for which external media is attached)

Parameters
int$id‪The page uid, &id=
int$type‪The page type, &type=
int$sys_language_uid‪sys_language uid, typically &L=
string$MP‪The MP variable (Mount Points), &MP=
array$uidRL‪Rootline array of only UIDs.
array$cHash_array‪Array of GET variables to register with this indexing
bool$createCHash‪If set, calculates a cHash value from the $cHash_array. Probably you will not do that since such cases are indexed through the frontend and the idea of this interface is to index non-cacheable pages from the backend!

Definition at line 356 of file Indexer.php.

References TYPO3\CMS\Core\Utility\HttpUtility\buildQueryString(), and TYPO3\CMS\IndexedSearch\Indexer\init().

◆ backend_setFreeIndexUid()

TYPO3\CMS\IndexedSearch\Indexer::backend_setFreeIndexUid (   $freeIndexUid,
  $freeIndexSetId = 0 
)

Sets the free-index uid. Can be called right after backend_initIndexer()

Parameters
int$freeIndexUid‪Free index UID
int$freeIndexSetId‪Set id - an integer identifying the "set" of indexing operations.

Definition at line 409 of file Indexer.php.

◆ bodyDescription()

string TYPO3\CMS\IndexedSearch\Indexer::bodyDescription (   $contentArr)

Extracts the sample description text from the content array.

Parameters
array$contentArr‪Content array
Returns
‪string Description string

Definition at line 1313 of file Indexer.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFilePage(), and TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ charsetEntity2utf8()

TYPO3\CMS\IndexedSearch\Indexer::charsetEntity2utf8 ( $contentArr,
  $charset 
)

Convert character set and HTML entities in the value of input content array keys

Parameters
array$contentArr‪Standard content array
string$charset‪Charset of the input content (converted to utf-8)

Definition at line 1273 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkContentHash()

mixed TYPO3\CMS\IndexedSearch\Indexer::checkContentHash ( )

Check content hash in phash table

Returns
‪mixed Returns TRUE if the page needs to be indexed (that is, there was no result), otherwise the phash value (in an array) of the phash record to which the grlist_record should be related!

Definition at line 1855 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkExternalDocContentHash()

bool TYPO3\CMS\IndexedSearch\Indexer::checkExternalDocContentHash (   $hashGr,
  $content_md5h 
)

Check content hash for external documents Returns TRUE if the document needs to be indexed (that is, there was no result)

Parameters
int$hashGr‪phash value to check (phash_grouping)
int$content_md5h‪Content hash to check
Returns
‪bool Returns TRUE if the document needs to be indexed (that is, there was no result)

Definition at line 1889 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, and TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ checkMtimeTstamp()

int TYPO3\CMS\IndexedSearch\Indexer::checkMtimeTstamp (   $mtime,
  $phash 
)

Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to be indexed

Parameters
int$mtime‪mtime value to test against limits and indexed page (usually this is the mtime of the cached document)
int$phash‪"phash" used to select any already indexed page to see what its mtime is.
Returns
‪int Result integer: Generally: <0 = No indexing, >0 = Do indexing (see $this->reasons): -2) Min age was NOT exceeded and so indexing cannot occur. -1) mtime matched so no need to reindex page. 0) N/A 1) Max age exceeded, page must be indexed again. 2) mtime of indexed page doesn't match mtime given for current content and we must index page. 3) No mtime was set, so we will index... 4) No indexed page found, so of course we will index.

Definition at line 1792 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), and TYPO3\CMS\IndexedSearch\Indexer\updateTstamp().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkWordList()

TYPO3\CMS\IndexedSearch\Indexer::checkWordList (   $wordListArray)

Adds new words to db

Parameters
array$wordListArray‪Word List array (where each word has information about position etc).

Definition at line 2089 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), and TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ convertHTMLToUtf8()

string TYPO3\CMS\IndexedSearch\Indexer::convertHTMLToUtf8 (   $content,
  $charset = '' 
)

Converts a HTML document to utf-8

Parameters
string$content‪HTML content, any charset
string$charset‪Optional charset (otherwise extracted from HTML)
Returns
‪string Converted HTML

Definition at line 680 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\getHTMLcharset().

◆ createLocalPath()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPath (   $sourcePath)
protected

Checks if the file is local

Parameters
string$sourcePath
Returns
‪string Absolute path to file if file is local, else empty string

Definition at line 949 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractHyperLinks().

◆ createLocalPathFromAbsoluteURL()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromAbsoluteURL (   $sourcePath)
protected

Attempts to create a local file path from the absolute URL without schema.

Parameters
string$sourcePath
Returns
‪string

Definition at line 1044 of file Indexer.php.

References TYPO3\CMS\Core\Core\Environment\getPublicPath().

◆ createLocalPathFromRelativeURL()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromRelativeURL (   $sourcePath)
protected

Attempts to create a local file path from the relative URL.

Parameters
string$sourcePath
Returns
‪string

Definition at line 1063 of file Indexer.php.

References TYPO3\CMS\Core\Core\Environment\getPublicPath().

◆ createLocalPathFromT3vars()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromT3vars (   $sourcePath)
protected

Attempts to create a local file path from T3VARs. This is useful for various download extensions that hide actual file name but still want the file to be indexed.

Parameters
string$sourcePath
Returns
‪string

Definition at line 976 of file Indexer.php.

References $GLOBALS.

◆ createLocalPathUsingAbsRefPrefix()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPathUsingAbsRefPrefix (   $sourcePath)
protected

Attempts to create a local file path by matching absRefPrefix. This requires TSFE. If TSFE is missing, this function does nothing.

Parameters
string$sourcePath
Returns
‪string

Definition at line 1020 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\Core\Core\Environment\getPublicPath().

◆ createLocalPathUsingDomainURL()

string TYPO3\CMS\IndexedSearch\Indexer::createLocalPathUsingDomainURL (   $sourcePath)
protected

Attempts to create a local file path by matching a current request URL.

Parameters
string$sourcePath
Returns
‪string

Definition at line 998 of file Indexer.php.

References TYPO3\CMS\Core\Core\Environment\getPublicPath().

◆ embracingTags()

bool TYPO3\CMS\IndexedSearch\Indexer::embracingTags (   $string,
  $tagName,
$tagContent,
$stringAfter,
$paramList 
)

Finds first occurrence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns FALSE if no match found. ie. useful for finding <title> of document or removing <script>-sections

Parameters
string$string‪String to search in
string$tagName‪Tag name, eg. "script
string$tagContent‪Passed by reference: Content inside found tag
string$stringAfter‪Passed by reference: Content after found tag
string$paramList‪Passed by reference: Attributes of the found tag.
Returns
‪bool Returns FALSE if tag was not found, otherwise TRUE.

Definition at line 705 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ extractBaseHref()

string TYPO3\CMS\IndexedSearch\Indexer::extractBaseHref (   $html)

Extracts the "base href" from content string.

Parameters
string$html‪Content to analyze
Returns
‪string The base href or an empty string if not found

Definition at line 867 of file Indexer.php.

◆ extractHyperLinks()

array TYPO3\CMS\IndexedSearch\Indexer::extractHyperLinks (   $html)

Extracts all links to external documents from the HTML content string

Parameters
string$html
Returns
‪array Array of hyperlinks (keys: tag, href, localPath (empty if not local))
See also
extractLinks()

Definition at line 838 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\createLocalPath().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks().

◆ extractLinks()

TYPO3\CMS\IndexedSearch\Indexer::extractLinks (   $content)

◆ fileContentParts()

array TYPO3\CMS\IndexedSearch\Indexer::fileContentParts (   $ext,
  $absFile 
)

Creates an array with pointers to divisions of document.

Parameters
string$ext‪File extension
string$absFile‪Absolute filename (must exist and be validated OK before calling function)
Returns
‪array Array of pointers to sections that the document should be divided into

Definition at line 1238 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ freqMap()

int TYPO3\CMS\IndexedSearch\Indexer::freqMap (   $freq)

maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.

Parameters
float$freq‪Frequency
Returns
‪int Frequency in range.

Definition at line 2201 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$freqRange.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitWords().

◆ getHTMLcharset()

string TYPO3\CMS\IndexedSearch\Indexer::getHTMLcharset (   $content)

Extract the charset value from HTML meta tag.

Parameters
string$content‪HTML content
Returns
‪string The charset value if found.

Definition at line 664 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\convertHTMLToUtf8().

◆ getRootLineFields()

TYPO3\CMS\IndexedSearch\Indexer::getRootLineFields ( array &  $fieldArray)

Adding values for root-line fields. rl0, rl1 and rl2 are standard. A hook might add more.

Parameters
array$fieldArray‪Field array, passed by reference

Definition at line 2069 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submit_section(), and TYPO3\CMS\IndexedSearch\Indexer\updateRootline().

◆ getUrlHeaders()

mixed TYPO3\CMS\IndexedSearch\Indexer::getUrlHeaders (   $url)

Getting HTTP request headers of URL

Parameters
string$url‪The URL
Returns
‪mixed If no answer, returns FALSE. Otherwise an array where HTTP headers are keys

Definition at line 924 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexExternalUrl().

◆ hook_indexContent()

TYPO3\CMS\IndexedSearch\Indexer::hook_indexContent ( $pObj)

Parent Object (TSFE) Initialization

Parameters
TypoScriptFrontendController$pObj‪Parent Object, passed by reference

Definition at line 240 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent(), TYPO3\CMS\IndexedSearch\Indexer\init(), TYPO3\CMS\IndexedSearch\Indexer\log_pull(), TYPO3\CMS\IndexedSearch\Indexer\log_push(), and TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage().

◆ indexAnalyze()

array TYPO3\CMS\IndexedSearch\Indexer::indexAnalyze (   $content)

Analyzes content to use for indexing,

Parameters
array$content‪Standard content array: an array with the keys title,keywords,description and body, which all contain an array of words.
Returns
‪array Index Array (whatever that is...)

Definition at line 1331 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\analyzeBody(), and TYPO3\CMS\IndexedSearch\Indexer\analyzeHeaderinfo().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ indexExternalUrl()

TYPO3\CMS\IndexedSearch\Indexer::indexExternalUrl (   $externalUrl)

Index External URLs HTML content

Parameters
string$externalUrl‪URL, eg. "http://typo3.org/
See also
indexRegularDocument()

Definition at line 898 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\getUrlHeaders(), and TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks().

◆ indexRegularDocument()

TYPO3\CMS\IndexedSearch\Indexer::indexRegularDocument (   $file,
  $force = false,
  $contentTmpFile = '',
  $altExtension = '' 
)

Indexing a regular document given as $file (relative to public web path, local file)

Parameters
string$file‪Relative Filename, relative to public web path. It can also be an absolute path as long as it is inside the lockRootPath (validated with \TYPO3\CMS\Core\Utility\GeneralUtility::isAbsPath()). Finally, if $contentTmpFile is set, this value can be anything, most likely a URL
bool$force‪If set, indexing is forced (despite content hashes, mtime etc).
string$contentTmpFile‪Temporary file with the content to read it from (instead of $file). Used when the $file is a URL.
string$altExtension‪File extension for temporary file.

Definition at line 1114 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, TYPO3\CMS\IndexedSearch\Indexer\$contentParts, TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\IndexedSearch\Indexer\checkExternalDocContentHash(), TYPO3\CMS\IndexedSearch\Indexer\checkMtimeTstamp(), TYPO3\CMS\IndexedSearch\Indexer\checkWordList(), TYPO3\CMS\IndexedSearch\Indexer\fileContentParts(), TYPO3\CMS\Core\Core\Environment\getPublicPath(), TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), TYPO3\CMS\IndexedSearch\Indexer\log_pull(), TYPO3\CMS\IndexedSearch\Indexer\log_push(), TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), TYPO3\CMS\IndexedSearch\Indexer\processWordsInArrays(), TYPO3\CMS\IndexedSearch\Indexer\readFileContent(), TYPO3\CMS\IndexedSearch\Indexer\setExtHashes(), TYPO3\CMS\IndexedSearch\Indexer\submitFile_section(), TYPO3\CMS\IndexedSearch\Indexer\submitFilePage(), TYPO3\CMS\IndexedSearch\Indexer\submitWords(), TYPO3\CMS\IndexedSearch\Indexer\updateParsetime(), and TYPO3\CMS\IndexedSearch\Indexer\updateTstamp().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks(), and TYPO3\CMS\IndexedSearch\Indexer\indexExternalUrl().

◆ indexTypo3PageContent()

◆ init()

◆ initializeExternalParsers()

TYPO3\CMS\IndexedSearch\Indexer::initializeExternalParsers ( )

Initialize external parsers

See also
init()

Definition at line 513 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\init().

◆ is_grlist_set()

bool TYPO3\CMS\IndexedSearch\Indexer::is_grlist_set (   $phash_x)

Checks if a grlist record has been set for the phash value input (looking at the "real" phash of the current content, not the linked-to phash of the common search result page)

Parameters
int$phash_x‪Phash integer to test.
Returns
‪bool

Definition at line 1915 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ isAllowedLocalFile()

static bool TYPO3\CMS\IndexedSearch\Indexer::isAllowedLocalFile (   $filePath)
staticprotected

Checks if the path points to the file inside the web site

Parameters
string$filePath
Returns
‪bool

Definition at line 1093 of file Indexer.php.

References TYPO3\CMS\Core\Core\Environment\getPublicPath().

◆ isRelativeURL()

static bool TYPO3\CMS\IndexedSearch\Indexer::isRelativeURL (   $url)
staticprotected

Checks if URL is relative.

Parameters
string$url
Returns
‪bool

Definition at line 1081 of file Indexer.php.

◆ log_pull()

TYPO3\CMS\IndexedSearch\Indexer::log_pull ( )

◆ log_push()

TYPO3\CMS\IndexedSearch\Indexer::log_push (   $msg,
  $key 
)

Push function wrapper for TT logging

Parameters
string$msg‪Title to set
string$key‪Key (?)

Definition at line 2272 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\hook_indexContent(), TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ log_setTSlogMessage()

TYPO3\CMS\IndexedSearch\Indexer::log_setTSlogMessage (   $msg,
  $errorNum = 0 
)

◆ metaphone()

mixed TYPO3\CMS\IndexedSearch\Indexer::metaphone (   $word,
  $returnRawMetaphoneValue = false 
)

Creating metaphone based hash from input word

Parameters
string$word‪Word to convert
bool$returnRawMetaphoneValue‪If set, returns the raw metaphone value (not hashed)
Returns
‪mixed Metaphone hash integer (or raw value, string)

Definition at line 1410 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\analyzeBody(), TYPO3\CMS\IndexedSearch\Indexer\analyzeHeaderinfo(), and TYPO3\CMS\IndexedSearch\Domain\Repository\IndexSearchRepository\getSearchString().

◆ processWordsInArrays()

array TYPO3\CMS\IndexedSearch\Indexer::processWordsInArrays (   $contentArr)

Processing words in the array from split*Content -functions

Parameters
array$contentArr‪Array of content to index, see splitHTMLContent() and splitRegularContent()
Returns
‪array Content input array modified so each key is not a unique array of words

Definition at line 1293 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ readFileContent()

array TYPO3\CMS\IndexedSearch\Indexer::readFileContent (   $fileExtension,
  $absoluteFileName,
  $sectionPointer 
)

Reads the content of an external file being indexed. The content from the external parser MUST be returned in utf-8!

Parameters
string$fileExtension‪File extension, eg. "pdf", "doc" etc.
string$absoluteFileName‪Absolute filename of file (must exist and be validated OK before calling function)
string$sectionPointer‪Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be splitted.)
Returns
‪array Standard content array (title, description, keywords, body keys)

Definition at line 1221 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ removeOldIndexedFiles()

TYPO3\CMS\IndexedSearch\Indexer::removeOldIndexedFiles (   $phash)

Removes records for the indexed page, $phash

Parameters
int$phash‪phash value to flush

Definition at line 1766 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFilePage().

◆ removeOldIndexedPages()

TYPO3\CMS\IndexedSearch\Indexer::removeOldIndexedPages (   $phash)

Removes records for the indexed page, $phash

Parameters
int$phash‪phash value to flush

Definition at line 1569 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ setExtHashes()

array TYPO3\CMS\IndexedSearch\Indexer::setExtHashes (   $file,
  $subinfo = [] 
)

Get search hash, external files

Parameters
string$file‪File name / path which identifies it on the server
array$subinfo‪Additional content identifying the (subpart of) content. For instance; PDF files are divided into groups of pages for indexing.
Returns
‪array Array with "phash_grouping" and "phash" inside.

Definition at line 2246 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$hash, and TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ setT3Hashes()

TYPO3\CMS\IndexedSearch\Indexer::setT3Hashes ( )

◆ splitHTMLContent()

array TYPO3\CMS\IndexedSearch\Indexer::splitHTMLContent (   $content)

Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.

Parameters
string$content‪HTML content to index. To some degree expected to be made by TYPO3 (ei. splitting the header by ":")
Returns
‪array Array of content, having keys "title", "body", "keywords" and "description" set.
See also
splitRegularContent()

Definition at line 612 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$defaultContentArray, TYPO3\CMS\IndexedSearch\Indexer\addSpacesToKeywordList(), TYPO3\CMS\IndexedSearch\Indexer\embracingTags(), and TYPO3\CMS\IndexedSearch\Indexer\typoSearchTags().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ splitRegularContent()

array TYPO3\CMS\IndexedSearch\Indexer::splitRegularContent (   $content)

Splits non-HTML content (from external files for instance)

Parameters
string$content‪Input content (non-HTML) to index.
Returns
‪array Array of content, having the key "body" set (plus "title", "description" and "keywords", but empty)
See also
splitHTMLContent()

Definition at line 1255 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$defaultContentArray.

◆ submit_grlist()

TYPO3\CMS\IndexedSearch\Indexer::submit_grlist (   $hash,
  $phash_x 
)

◆ submit_section()

TYPO3\CMS\IndexedSearch\Indexer::submit_section (   $hash,
  $hash_t3 
)

Stores section $hash and $hash_t3 are the same for TYPO3 pages, but different when it is external files.

Parameters
int$hash‪phash of TYPO3 parent search result record
int$hash_t3‪phash of the file indexation search record

Definition at line 1549 of file Indexer.php.

References $fields, TYPO3\CMS\IndexedSearch\Indexer\$hash, TYPO3\CMS\IndexedSearch\Indexer\getRootLineFields(), and TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFile_section(), and TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ submitFile_grlist()

TYPO3\CMS\IndexedSearch\Indexer::submitFile_grlist (   $hash)

Stores file gr_list for a file IF it does not exist already

Parameters
int$hash‪phash value of file

Definition at line 1686 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$hash, TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\submit_grlist().

◆ submitFile_section()

TYPO3\CMS\IndexedSearch\Indexer::submitFile_section (   $hash)

Stores file section for a file IF it does not exist

Parameters
int$hash‪phash value of file

Definition at line 1732 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$hash, TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), and TYPO3\CMS\IndexedSearch\Indexer\submit_section().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ submitFilePage()

TYPO3\CMS\IndexedSearch\Indexer::submitFilePage (   $hash,
  $file,
  $subinfo,
  $ext,
  $mtime,
  $ctime,
  $size,
  $content_md5h,
  $contentParts 
)

Updates db with information about the file

Parameters
array$hash‪Array with phash and phash_grouping keys for file
string$file‪File name
array$subinfo‪Array of "cHashParams" for files: This is for instance the page index for a PDF file (other document types it will be a zero)
string$ext‪File extension determining the type of media.
int$mtime‪Modification time of file.
int$ctime‪Creation time of file.
int$size‪Size of file in bytes
int$content_md5h‪Content HASH value.
array$contentParts‪Standard content array (using only title and body for a file)

Definition at line 1608 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, TYPO3\CMS\IndexedSearch\Indexer\$contentParts, $fields, $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\$hash, TYPO3\CMS\IndexedSearch\Indexer\$metaphoneContent, TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\IndexedSearch\Indexer\bodyDescription(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), TYPO3\CMS\Core\Database\Connection\PARAM_LOB, and TYPO3\CMS\IndexedSearch\Indexer\removeOldIndexedFiles().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ submitPage()

◆ submitWords()

TYPO3\CMS\IndexedSearch\Indexer::submitWords (   $wordList,
  $phash 
)

◆ typoSearchTags()

bool TYPO3\CMS\IndexedSearch\Indexer::typoSearchTags ( $body)

Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.

Parameters
string$body‪HTML Content, passed by reference
Returns
‪bool Returns TRUE if a TYPOSEARCH_ tag was found, otherwise FALSE.

Definition at line 734 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ update_grlist()

TYPO3\CMS\IndexedSearch\Indexer::update_grlist (   $phash,
  $phash_x 
)

Check if an grlist-entry for this hash exists and if not so, write one.

Parameters
int$phash‪phash of the search result that should be found
int$phash_x‪The real phash of the current content. The two values are different when a page with userlogin turns out to contain the exact same content as another already indexed version of the page; This is the whole reason for the grlist table in fact...
See also
submit_grlist()

Definition at line 1939 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed(), TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\submit_grlist().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateParsetime()

TYPO3\CMS\IndexedSearch\Indexer::updateParsetime (   $phash,
  $parsetime 
)

Update parsetime for phash row.

Parameters
int$phash‪phash value.
int$parsetime‪Parsetime value to set.

Definition at line 2021 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateRootline()

TYPO3\CMS\IndexedSearch\Indexer::updateRootline ( )

◆ updateSetId()

TYPO3\CMS\IndexedSearch\Indexer::updateSetId (   $phash)

Update SetID of the index_phash record.

Parameters
int$phash‪phash value

Definition at line 1996 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateTstamp()

TYPO3\CMS\IndexedSearch\Indexer::updateTstamp (   $phash,
  $mtime = 0 
)

Update tstamp for a phash row.

Parameters
int$phash‪phash value
int$mtime‪If set, update the mtime field to this value.

Definition at line 1966 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\isTableUsed().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\checkMtimeTstamp(), TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

Member Data Documentation

◆ $cHashParams

array TYPO3\CMS\IndexedSearch\Indexer::$cHashParams = array( )

cHash params array

Definition at line 176 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\setT3Hashes().

◆ $conf

array TYPO3\CMS\IndexedSearch\Indexer::$conf = array( )

Definition at line 130 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks().

◆ $content_md5h

◆ $contentParts

array TYPO3\CMS\IndexedSearch\Indexer::$contentParts = array( )

◆ $crawlerActive

bool TYPO3\CMS\IndexedSearch\Indexer::$crawlerActive = false

If TRUE, indexing is forced despite of hashes etc.

Definition at line 107 of file Indexer.php.

◆ $csObj

TYPO3 CMS Core Charset CharsetConverter TYPO3\CMS\IndexedSearch\Indexer::$csObj
protected

Charset class object

Deprecated:
‪since TYPO3 v9.3, will be removed in TYPO3 v10.0 (also the instantiation in the init() method).

Definition at line 205 of file Indexer.php.

◆ $defaultContentArray

array TYPO3\CMS\IndexedSearch\Indexer::$defaultContentArray
Initial value:
= array(
'title' => '',
'description' => '',
'keywords' => '',
'body' => ''
)

Set when crawler is detected (internal)

Definition at line 113 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent(), and TYPO3\CMS\IndexedSearch\Indexer\splitRegularContent().

◆ $defaultGrList

string TYPO3\CMS\IndexedSearch\Indexer::$defaultGrList = '0,-1'

External parser objects, keys are file extension names. Values are objects with certain methods. Fe-group list (pages might be indexed separately for each usergroup combination to support search in access limited pages!)

Definition at line 76 of file Indexer.php.

◆ $deprecatedPublicProperties

array TYPO3\CMS\IndexedSearch\Indexer::$deprecatedPublicProperties
protected
Initial value:
= array(
'csObj' => 'Using $csObj within Indexing is discouraged, the property will be removed in TYPO3 v10.0 - if needed instantiate CharsetConverter yourself.',
)

List of all deprecated public properties

Definition at line 43 of file Indexer.php.

◆ $enableMetaphoneSearch

bool TYPO3\CMS\IndexedSearch\Indexer::$enableMetaphoneSearch = false

Definition at line 190 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\init().

◆ $excludeSections

string TYPO3\CMS\IndexedSearch\Indexer::$excludeSections = 'script,style'

HTML code blocks to exclude from indexing

Definition at line 62 of file Indexer.php.

◆ $external_parsers

array TYPO3\CMS\IndexedSearch\Indexer::$external_parsers = array( )

Supported Extensions for external files

Definition at line 68 of file Indexer.php.

◆ $externalFileCounter

int TYPO3\CMS\IndexedSearch\Indexer::$externalFileCounter = 0

Definition at line 126 of file Indexer.php.

◆ $file_phash_arr

array TYPO3\CMS\IndexedSearch\Indexer::$file_phash_arr = array( )

Hash array, contains phash and phash_grouping

Definition at line 148 of file Indexer.php.

◆ $flagBitMask

bool TYPO3\CMS\IndexedSearch\Indexer::$flagBitMask

Definition at line 221 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitWords().

◆ $forceIndexing

bool TYPO3\CMS\IndexedSearch\Indexer::$forceIndexing = false

Max number of external files to index.

Definition at line 101 of file Indexer.php.

◆ $freqMax

float TYPO3\CMS\IndexedSearch\Indexer::$freqMax = 0.1

Definition at line 186 of file Indexer.php.

◆ $freqRange

int TYPO3\CMS\IndexedSearch\Indexer::$freqRange = 32000

cHashparams array

Definition at line 182 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\freqMap().

◆ $hash

◆ $indexerConfig

array TYPO3\CMS\IndexedSearch\Indexer::$indexerConfig = array( )

Configuration set internally (see init functions for required keys and their meaning)

Definition at line 136 of file Indexer.php.

◆ $indexExternalUrl_content

string TYPO3\CMS\IndexedSearch\Indexer::$indexExternalUrl_content = ''

Internal log

Definition at line 170 of file Indexer.php.

◆ $internal_log

array TYPO3\CMS\IndexedSearch\Indexer::$internal_log = array( )

Definition at line 164 of file Indexer.php.

◆ $lexerObj

TYPO3 CMS IndexedSearch Lexer TYPO3\CMS\IndexedSearch\Indexer::$lexerObj

Lexer object for word splitting

Definition at line 217 of file Indexer.php.

◆ $maxExternalFiles

int TYPO3\CMS\IndexedSearch\Indexer::$maxExternalFiles = 0

If set, this tells a minimum limit before a document can be indexed again. This is regardless of mtime.

Definition at line 95 of file Indexer.php.

◆ $metaphoneContent

string TYPO3\CMS\IndexedSearch\Indexer::$metaphoneContent = ''

◆ $metaphoneObj

TYPO3 CMS IndexedSearch Utility DoubleMetaPhoneUtility TYPO3\CMS\IndexedSearch\Indexer::$metaphoneObj

Metaphone object, if any

Definition at line 211 of file Indexer.php.

◆ $reasons

array TYPO3\CMS\IndexedSearch\Indexer::$reasons
Initial value:
= array(
-1 => 'mtime matched the document, so no changes detected and no content updated',
-2 => 'The minimum age was not exceeded',
1 => 'The configured max-age was exceeded for the document and thus it\'s indexed.',
2 => 'The minimum age was exceed and mtime was set and the mtime was different, so the page was indexed.',
3 => 'The minimum age was exceed, but mtime was not set, so the page was indexed.',
4 => 'Page has never been indexed (is not represented in the index_phash table).'
)

Definition at line 49 of file Indexer.php.

◆ $storeMetaphoneInfoAsWords

bool TYPO3\CMS\IndexedSearch\Indexer::$storeMetaphoneInfoAsWords

Definition at line 194 of file Indexer.php.

◆ $timeTracker

TimeTracker TYPO3\CMS\IndexedSearch\Indexer::$timeTracker
protected

Definition at line 225 of file Indexer.php.

◆ $tstamp_maxAge

int TYPO3\CMS\IndexedSearch\Indexer::$tstamp_maxAge = 0

Min/Max times

Definition at line 82 of file Indexer.php.

◆ $tstamp_minAge

int TYPO3\CMS\IndexedSearch\Indexer::$tstamp_minAge = 0

If set, this tells a number of seconds that is the maximum age of an indexed document. Regardless of mtime the document will be re-indexed if this limit is exceeded.

Definition at line 89 of file Indexer.php.

◆ $wordcount

int TYPO3\CMS\IndexedSearch\Indexer::$wordcount = 0

Definition at line 122 of file Indexer.php.