TYPO3 CMS  TYPO3_6-2
TYPO3\CMS\IndexedSearch\Indexer Class Reference
Inheritance diagram for TYPO3\CMS\IndexedSearch\Indexer:
tx_indexedsearch_indexer

Public Member Functions

 hook_indexContent (&$pObj)
 
 backend_initIndexer ($id, $type, $sys_language_uid, $MP, $uidRL, $cHash_array=array(), $createCHash=FALSE)
 
 backend_setFreeIndexUid ($freeIndexUid, $freeIndexSetId=0)
 
 backend_indexAsTYPO3Page ($title, $keywords, $description, $content, $charset, $mtime, $crdate=0, $recordUid=0)
 
 init ()
 
 initializeExternalParsers ()
 
 indexTypo3PageContent ()
 
 splitHTMLContent ($content)
 
 getHTMLcharset ($content)
 
 convertHTMLToUtf8 ($content, $charset='')
 
 embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList)
 
 typoSearchTags (&$body)
 
 extractLinks ($content)
 
 extractHyperLinks ($html)
 
 extractBaseHref ($html)
 
 indexExternalUrl ($externalUrl)
 
 getUrlHeaders ($url)
 
 indexRegularDocument ($file, $force=FALSE, $contentTmpFile='', $altExtension='')
 
 readFileContent ($fileExtension, $absoluteFileName, $sectionPointer)
 
 fileContentParts ($ext, $absFile)
 
 splitRegularContent ($content)
 
 charsetEntity2utf8 (&$contentArr, $charset)
 
 processWordsInArrays ($contentArr)
 
 bodyDescription ($contentArr)
 
 indexAnalyze ($content)
 
 analyzeHeaderinfo (&$retArr, $content, $key, $offset)
 
 analyzeBody (&$retArr, $content)
 
 metaphone ($word, $returnRawMetaphoneValue=FALSE)
 
 submitPage ()
 
 submit_grlist ($hash, $phash_x)
 
 submit_section ($hash, $hash_t3)
 
 removeOldIndexedPages ($phash)
 
 submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts)
 
 submitFile_grlist ($hash)
 
 submitFile_section ($hash)
 
 removeOldIndexedFiles ($phash)
 
 checkMtimeTstamp ($mtime, $phash)
 
 checkContentHash ()
 
 checkExternalDocContentHash ($hashGr, $content_md5h)
 
 is_grlist_set ($phash_x)
 
 update_grlist ($phash, $phash_x)
 
 updateTstamp ($phash, $mtime=0)
 
 updateSetId ($phash)
 
 updateParsetime ($phash, $parsetime)
 
 updateRootline ()
 
 getRootLineFields (array &$fieldArray)
 
 removeLoginpagesWithContentHash ()
 
 includeCrawlerClass ()
 
 checkWordList ($wordListArray)
 
 submitWords ($wordList, $phash)
 
 freqMap ($freq)
 
 setT3Hashes ()
 
 setExtHashes ($file, $subinfo=array())
 
 log_push ($msg, $key)
 
 log_pull ()
 
 log_setTSlogMessage ($msg, $errorNum=0)
 

Public Attributes

 $reasons
 
 $excludeSections = 'script,style'
 
 $external_parsers = array()
 
 $defaultGrList = '0,-1'
 
 $tstamp_maxAge = 0
 
 $tstamp_minAge = 0
 
 $maxExternalFiles = 0
 
 $forceIndexing = FALSE
 
 $crawlerActive = FALSE
 
 $defaultContentArray
 
 $wordcount = 0
 
 $externalFileCounter = 0
 
 $conf = array()
 
 $indexerConfig = array()
 
 $hash = array()
 
 $file_phash_arr = array()
 
 $contentParts = array()
 
 $content_md5h = ''
 
 $internal_log = array()
 
 $indexExternalUrl_content = ''
 
 $cHashParams = array()
 
 $freqRange = 32000
 
 $freqMax = 0.1
 
 $enableMetaphoneSearch = FALSE
 
 $storeMetaphoneInfoAsWords
 
 $metaphoneContent = ''
 
 $csObj
 
 $metaphoneObj
 
 $lexerObj
 
 $flagBitMask
 

Protected Member Functions

 createLocalPath ($sourcePath)
 
 createLocalPathFromT3vars ($sourcePath)
 
 createLocalPathUsingDomainURL ($sourcePath)
 
 createLocalPathUsingAbsRefPrefix ($sourcePath)
 
 createLocalPathFromAbsoluteURL ($sourcePath)
 
 createLocalPathFromRelativeURL ($sourcePath)
 
 addSpacesToKeywordList ($keywordList)
 

Static Protected Member Functions

static isRelativeURL ($url)
 
static isAllowedLocalFile ($filePath)
 

Detailed Description

This class is a search indexer for TYPO3

Author
Kasper Skårhøj kaspe.nosp@m.rYYY.nosp@m.Y@typ.nosp@m.o3.c.nosp@m.om Indexing class for TYPO3 frontend
Kasper Skårhøj kaspe.nosp@m.rYYY.nosp@m.Y@typ.nosp@m.o3.c.nosp@m.om

Definition at line 29 of file Indexer.php.

Member Function Documentation

◆ addSpacesToKeywordList()

TYPO3\CMS\IndexedSearch\Indexer::addSpacesToKeywordList (   $keywordList)
protected

Makes sure that keywords are space-separated. This is impotant for their proper displaying as a part of fulltext index.

Parameters
string$keywordList
Returns
string
See also
http://forge.typo3.org/issues/14959

Definition at line 2138 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ analyzeBody()

TYPO3\CMS\IndexedSearch\Indexer::analyzeBody ( $retArr,
  $content 
)

Calculates relevant information for bodycontent

Parameters
arrayIndex array, passed by reference
arrayStandard content array
Returns
void
Todo:
Define visibility

Definition at line 1367 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\metaphone().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze().

◆ analyzeHeaderinfo()

TYPO3\CMS\IndexedSearch\Indexer::analyzeHeaderinfo ( $retArr,
  $content,
  $key,
  $offset 
)

Calculates relevant information for headercontent

Parameters
arrayIndex array, passed by reference
arrayStandard content array
stringKey from standard content array
integerBit-wise priority to type
Returns
void
Todo:
Define visibility

Definition at line 1336 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), and TYPO3\CMS\IndexedSearch\Indexer\metaphone().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze().

◆ backend_indexAsTYPO3Page()

TYPO3\CMS\IndexedSearch\Indexer::backend_indexAsTYPO3Page (   $title,
  $keywords,
  $description,
  $content,
  $charset,
  $mtime,
  $crdate = 0,
  $recordUid = 0 
)

Indexing records as the content of a TYPO3 page.

Parameters
stringTitle equivalent
stringKeywords equivalent
stringDescription equivalent
stringThe main content to index
stringThe charset of the title, keyword, description and body-content. MUST BE VALID, otherwise nothing is indexed!
integerLast modification time, in seconds
integerThe creation date of the content, in seconds
integerThe record UID that the content comes from (for registration with the indexed rows)
Returns
void
Todo:
Define visibility

Definition at line 400 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ backend_initIndexer()

TYPO3\CMS\IndexedSearch\Indexer::backend_initIndexer (   $id,
  $type,
  $sys_language_uid,
  $MP,
  $uidRL,
  $cHash_array = array(),
  $createCHash = FALSE 
)

Initializing the "combined ID" of the page (phash) being indexed (or for which external media is attached)

Parameters
integerThe page uid, &id=
integerThe page type, &type=
integersys_language uid, typically &L=
stringThe MP variable (Mount Points), &MP=
arrayRootline array of only UIDs.
arrayArray of GET variables to register with this indexing
booleanIf set, calculates a cHash value from the $cHash_array. Probably you will not do that since such cases are indexed through the frontend and the idea of this interface is to index non-cachable pages from the backend!
Returns
void
Todo:
Define visibility

Definition at line 331 of file Indexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\implodeArrayForUrl(), TYPO3\CMS\IndexedSearch\Indexer\init(), and TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

◆ backend_setFreeIndexUid()

TYPO3\CMS\IndexedSearch\Indexer::backend_setFreeIndexUid (   $freeIndexUid,
  $freeIndexSetId = 0 
)

Sets the free-index uid. Can be called right after backend_initIndexer()

Parameters
integerFree index UID
integerSet id - an integer identifying the "set" of indexing operations.
Returns
void
Todo:
Define visibility

Definition at line 381 of file Indexer.php.

◆ bodyDescription()

TYPO3\CMS\IndexedSearch\Indexer::bodyDescription (   $contentArr)

Extracts the sample description text from the content array.

Parameters
arrayContent array
Returns
string Description string
Todo:
Define visibility

Definition at line 1298 of file Indexer.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFilePage(), and TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ charsetEntity2utf8()

TYPO3\CMS\IndexedSearch\Indexer::charsetEntity2utf8 ( $contentArr,
  $charset 
)

Convert character set and HTML entities in the value of input content array keys

Parameters
arrayStandard content array
stringCharset of the input content (converted to utf-8)
Returns
void
Todo:
Define visibility

Definition at line 1258 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkContentHash()

TYPO3\CMS\IndexedSearch\Indexer::checkContentHash ( )

Check content hash in phash table

Returns
mixed Returns TRUE if the page needs to be indexed (that is, there was no result), otherwise the phash value (in an array) of the phash record to which the grlist_record should be related!
Todo:
Define visibility

Definition at line 1761 of file Indexer.php.

References $GLOBALS, and $result.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkExternalDocContentHash()

TYPO3\CMS\IndexedSearch\Indexer::checkExternalDocContentHash (   $hashGr,
  $content_md5h 
)

Check content hash for external documents Returns TRUE if the document needs to be indexed (that is, there was no result)

Parameters
integerphash value to check (phash_grouping)
integerContent hash to check
Returns
boolean Returns TRUE if the document needs to be indexed (that is, there was no result)
Todo:
Define visibility

Definition at line 1782 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, $GLOBALS, and $result.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ checkMtimeTstamp()

TYPO3\CMS\IndexedSearch\Indexer::checkMtimeTstamp (   $mtime,
  $phash 
)

Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to be indexed

Parameters
integermtime value to test against limits and indexed page (usually this is the mtime of the cached document)
integer"phash" used to select any already indexed page to see what its mtime is.
Returns
integer Result integer: Generally: <0 = No indexing, >0 = Do indexing (see $this->reasons): -2) Min age was NOT exceeded and so indexing cannot occur. -1) mtime matched so no need to reindex page. 0) N/A 1) Max age exceeded, page must be indexed again. 2) mtime of indexed page doesn't match mtime given for current content and we must index page. 3) No mtime was set, so we will index... 4) No indexed page found, so of course we will index.
Todo:
Define visibility

Definition at line 1707 of file Indexer.php.

References $GLOBALS, $result, TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), and TYPO3\CMS\IndexedSearch\Indexer\updateTstamp().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ checkWordList()

TYPO3\CMS\IndexedSearch\Indexer::checkWordList (   $wordListArray)

Adds new words to db

Parameters
array$wordListArrayWord List array (where each word has information about position etc).
Returns
void
Todo:
Define visibility

Definition at line 1956 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ convertHTMLToUtf8()

TYPO3\CMS\IndexedSearch\Indexer::convertHTMLToUtf8 (   $content,
  $charset = '' 
)

Converts a HTML document to utf-8

Parameters
stringHTML content, any charset
stringOptional charset (otherwise extracted from HTML)
Returns
string Converted HTML
Todo:
Define visibility

Definition at line 662 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\getHTMLcharset().

◆ createLocalPath()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPath (   $sourcePath)
protected

Checks if the file is local

Parameters
$sourcePath
Returns
string Absolute path to file if file is local, else empty string

Definition at line 939 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractHyperLinks().

◆ createLocalPathFromAbsoluteURL()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromAbsoluteURL (   $sourcePath)
protected

Attempts to create a local file path from the absolute URL without schema.

Parameters
string$sourcePath
Returns
string

Definition at line 1030 of file Indexer.php.

◆ createLocalPathFromRelativeURL()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromRelativeURL (   $sourcePath)
protected

Attempts to create a local file path from the relative URL.

Parameters
string$sourcePath
Returns
string

Definition at line 1048 of file Indexer.php.

◆ createLocalPathFromT3vars()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPathFromT3vars (   $sourcePath)
protected

Attempts to create a local file path from T3VARs. This is useful for various download extensions that hide actual file name but still want the file to be indexed.

Parameters
string$sourcePath
Returns
string

Definition at line 965 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\Core\Utility\GeneralUtility\shortMD5().

◆ createLocalPathUsingAbsRefPrefix()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPathUsingAbsRefPrefix (   $sourcePath)
protected

Attempts to create a local file path by matching absRefPrefix. This requires TSFE. If TSFE is missing, this function does nothing.

Parameters
string$sourcePath
Returns
string

Definition at line 1007 of file Indexer.php.

References $GLOBALS.

◆ createLocalPathUsingDomainURL()

TYPO3\CMS\IndexedSearch\Indexer::createLocalPathUsingDomainURL (   $sourcePath)
protected

Attempts to create a local file path by matching a current request URL.

Parameters
string$sourcePath
Returns
string

Definition at line 986 of file Indexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getIndpEnv().

◆ embracingTags()

TYPO3\CMS\IndexedSearch\Indexer::embracingTags (   $string,
  $tagName,
$tagContent,
$stringAfter,
$paramList 
)

Finds first occurence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns FALSE if no match found. ie. useful for finding <title> of document or removing <script>-sections

Parameters
stringString to search in
stringTag name, eg. "script
stringPassed by reference: Content inside found tag
stringPassed by reference: Content after found tag
stringPassed by reference: Attributes of the found tag.
Returns
boolean Returns FALSE if tag was not found, otherwise TRUE.
Todo:
Define visibility

Definition at line 688 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ extractBaseHref()

TYPO3\CMS\IndexedSearch\Indexer::extractBaseHref (   $html)

Extracts the "base href" from content string.

Parameters
stringContent to analyze
Returns
string The base href or an empty string if not found

Definition at line 852 of file Indexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

◆ extractHyperLinks()

TYPO3\CMS\IndexedSearch\Indexer::extractHyperLinks (   $html)

Extracts all links to external documents from the HTML content string

Parameters
string$html
Returns
array Array of hyperlinks (keys: tag, href, localPath (empty if not local))
See also
extractLinks()
Todo:
Define visibility

Definition at line 824 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\createLocalPath(), and TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks().

◆ extractLinks()

◆ fileContentParts()

TYPO3\CMS\IndexedSearch\Indexer::fileContentParts (   $ext,
  $absFile 
)

Creates an array with pointers to divisions of document.

Parameters
stringFile extension
stringAbsolute filename (must exist and be validated OK before calling function)
Returns
array Array of pointers to sections that the document should be divided into
Todo:
Define visibility

Definition at line 1222 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ freqMap()

TYPO3\CMS\IndexedSearch\Indexer::freqMap (   $freq)

maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.

Parameters
doubleFrequency
Returns
integer Frequency in range.
Todo:
Define visibility

Definition at line 2019 of file Indexer.php.

◆ getHTMLcharset()

TYPO3\CMS\IndexedSearch\Indexer::getHTMLcharset (   $content)

Extract the charset value from HTML meta tag.

Parameters
stringHTML content
Returns
string The charset value if found.
Todo:
Define visibility

Definition at line 646 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\convertHTMLToUtf8().

◆ getRootLineFields()

TYPO3\CMS\IndexedSearch\Indexer::getRootLineFields ( array &  $fieldArray)

Adding values for root-line fields. rl0, rl1 and rl2 are standard. A hook might add more.

Parameters
arrayField array, passed by reference
Returns
void
Todo:
Define visibility

Definition at line 1901 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submit_section(), and TYPO3\CMS\IndexedSearch\Indexer\updateRootline().

◆ getUrlHeaders()

TYPO3\CMS\IndexedSearch\Indexer::getUrlHeaders (   $url)

Getting HTTP request headers of URL

Parameters
stringThe URL
integerTimeout (seconds?)
Returns
mixed If no answer, returns FALSE. Otherwise an array where HTTP headers are keys
Todo:
Define visibility

Definition at line 915 of file Indexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexExternalUrl().

◆ hook_indexContent()

TYPO3\CMS\IndexedSearch\Indexer::hook_indexContent ( $pObj)

◆ includeCrawlerClass()

TYPO3\CMS\IndexedSearch\Indexer::includeCrawlerClass ( )

Includes the crawler class

Returns
void
Todo:
Define visibility

Definition at line 1940 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks().

◆ indexAnalyze()

TYPO3\CMS\IndexedSearch\Indexer::indexAnalyze (   $content)

Analyzes content to use for indexing,

Parameters
arrayStandard content array: an array with the keys title,keywords,description and body, which all contain an array of words.
Returns
array Index Array (whatever that is...)
Todo:
Define visibility

Definition at line 1316 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\analyzeBody(), and TYPO3\CMS\IndexedSearch\Indexer\analyzeHeaderinfo().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ indexExternalUrl()

TYPO3\CMS\IndexedSearch\Indexer::indexExternalUrl (   $externalUrl)

◆ indexRegularDocument()

TYPO3\CMS\IndexedSearch\Indexer::indexRegularDocument (   $file,
  $force = FALSE,
  $contentTmpFile = '',
  $altExtension = '' 
)

Indexing a regular document given as $file (relative to PATH_site, local file)

Parameters
stringRelative Filename, relative to PATH_site. It can also be an absolute path as long as it is inside the lockRootPath (validated with ::isAbsPath()). Finally, if $contentTmpFile is set, this value can be anything, most likely a URL
booleanIf set, indexing is forced (despite content hashes, mtime etc).
stringTemporary file with the content to read it from (instead of $file). Used when the $file is a URL.
stringFile extension for temporary file.
Returns
void
Todo:
Define visibility

Definition at line 1098 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, TYPO3\CMS\IndexedSearch\Indexer\$contentParts, TYPO3\CMS\IndexedSearch\Indexer\checkExternalDocContentHash(), TYPO3\CMS\IndexedSearch\Indexer\checkMtimeTstamp(), TYPO3\CMS\IndexedSearch\Indexer\checkWordList(), TYPO3\CMS\IndexedSearch\Indexer\fileContentParts(), TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\IndexedSearch\Indexer\indexAnalyze(), TYPO3\CMS\Core\Utility\GeneralUtility\isAbsPath(), TYPO3\CMS\Core\Utility\GeneralUtility\isAllowedAbsPath(), TYPO3\CMS\IndexedSearch\Indexer\log_pull(), TYPO3\CMS\IndexedSearch\Indexer\log_push(), TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash(), TYPO3\CMS\Core\Utility\GeneralUtility\milliseconds(), TYPO3\CMS\IndexedSearch\Indexer\processWordsInArrays(), TYPO3\CMS\IndexedSearch\Indexer\readFileContent(), TYPO3\CMS\IndexedSearch\Indexer\setExtHashes(), TYPO3\CMS\IndexedSearch\Indexer\submitFile_section(), TYPO3\CMS\IndexedSearch\Indexer\submitFilePage(), TYPO3\CMS\IndexedSearch\Indexer\submitWords(), TYPO3\CMS\IndexedSearch\Indexer\updateParsetime(), and TYPO3\CMS\IndexedSearch\Indexer\updateTstamp().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\extractLinks(), and TYPO3\CMS\IndexedSearch\Indexer\indexExternalUrl().

◆ indexTypo3PageContent()

◆ init()

◆ initializeExternalParsers()

TYPO3\CMS\IndexedSearch\Indexer::initializeExternalParsers ( )

Initialize external parsers

Returns
void private
See also
init()
Todo:
Define visibility

Definition at line 490 of file Indexer.php.

References $TYPO3_CONF_VARS, and TYPO3\CMS\Core\Utility\GeneralUtility\getUserObj().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\init().

◆ is_grlist_set()

TYPO3\CMS\IndexedSearch\Indexer::is_grlist_set (   $phash_x)

Checks if a grlist record has been set for the phash value input (looking at the "real" phash of the current content, not the linked-to phash of the common search result page)

Parameters
integerPhash integer to test.
Returns
boolean
Todo:
Define visibility

Definition at line 1798 of file Indexer.php.

References $GLOBALS, and $result.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ isAllowedLocalFile()

static TYPO3\CMS\IndexedSearch\Indexer::isAllowedLocalFile (   $filePath)
staticprotected

Checks if the path points to the file inside the web site

Parameters
string$filePath
Returns
boolean

Definition at line 1076 of file Indexer.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\resolveBackPath().

◆ isRelativeURL()

static TYPO3\CMS\IndexedSearch\Indexer::isRelativeURL (   $url)
staticprotected

Checks if URL is relative.

Parameters
string$url
Returns
boolean

Definition at line 1065 of file Indexer.php.

◆ log_pull()

TYPO3\CMS\IndexedSearch\Indexer::log_pull ( )

Pull function wrapper for TT logging

Returns
void
Todo:
Define visibility

Definition at line 2104 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\hook_indexContent(), TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ log_push()

TYPO3\CMS\IndexedSearch\Indexer::log_push (   $msg,
  $key 
)

Push function wrapper for TT logging

Parameters
stringTitle to set
stringKey (?)
Returns
void
Todo:
Define visibility

Definition at line 2092 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\hook_indexContent(), TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ log_setTSlogMessage()

TYPO3\CMS\IndexedSearch\Indexer::log_setTSlogMessage (   $msg,
  $errorNum = 0 
)

◆ metaphone()

TYPO3\CMS\IndexedSearch\Indexer::metaphone (   $word,
  $returnRawMetaphoneValue = FALSE 
)

Creating metaphone based hash from input word

Parameters
stringWord to convert
booleanIf set, returns the raw metaphone value (not hashed)
Returns
mixed Metaphone hash integer (or raw value, string)
Todo:
Define visibility

Definition at line 1398 of file Indexer.php.

References $result, and TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility\md5inthash().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\analyzeBody(), and TYPO3\CMS\IndexedSearch\Indexer\analyzeHeaderinfo().

◆ processWordsInArrays()

TYPO3\CMS\IndexedSearch\Indexer::processWordsInArrays (   $contentArr)

Processing words in the array from split*Content -functions

Parameters
arrayArray of content to index, see splitHTMLContent() and splitRegularContent()
Returns
array Content input array modified so each key is not a unique array of words
Todo:
Define visibility

Definition at line 1278 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ readFileContent()

TYPO3\CMS\IndexedSearch\Indexer::readFileContent (   $fileExtension,
  $absoluteFileName,
  $sectionPointer 
)

Reads the content of an external file being indexed. The content from the external parser MUST be returned in utf-8!

Parameters
stringFile extension, eg. "pdf", "doc" etc.
stringAbsolute filename of file (must exist and be validated OK before calling function)
stringPointer to section (zero for all other than PDF which will have an indication of pages into which the document should be splitted.)
Returns
array Standard content array (title, description, keywords, body keys)
Todo:
Define visibility

Definition at line 1205 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ removeLoginpagesWithContentHash()

TYPO3\CMS\IndexedSearch\Indexer::removeLoginpagesWithContentHash ( )

Removes any indexed pages with userlogins which has the same contentHash NOT USED anywhere inside this class!

Returns
void
Todo:
Define visibility

Definition at line 1919 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage().

◆ removeOldIndexedFiles()

TYPO3\CMS\IndexedSearch\Indexer::removeOldIndexedFiles (   $phash)

Removes records for the indexed page, $phash

Parameters
integerphash value to flush
Returns
void
Todo:
Define visibility

Definition at line 1683 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFilePage().

◆ removeOldIndexedPages()

TYPO3\CMS\IndexedSearch\Indexer::removeOldIndexedPages (   $phash)

Removes records for the indexed page, $phash

Parameters
integerphash value to flush
Returns
void
Todo:
Define visibility

Definition at line 1546 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ setExtHashes()

TYPO3\CMS\IndexedSearch\Indexer::setExtHashes (   $file,
  $subinfo = array() 
)

Get search hash, external files

Parameters
stringFile name / path which identifies it on the server
arrayAdditional content identifying the (subpart of) content. For instance; PDF files are divided into groups of pages for indexing.
Returns
array Array with "phash_grouping" and "phash" inside.
Todo:
Define visibility

Definition at line 2065 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ setT3Hashes()

TYPO3\CMS\IndexedSearch\Indexer::setT3Hashes ( )

Get search hash, T3 pages

Returns
void
Todo:
Define visibility

Definition at line 2041 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\init().

◆ splitHTMLContent()

TYPO3\CMS\IndexedSearch\Indexer::splitHTMLContent (   $content)

Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.

Parameters
stringHTML content to index. To some degree expected to be made by TYPO3 (ei. splitting the header by ":")
Returns
array Array of content, having keys "title", "body", "keywords" and "description" set.
See also
splitRegularContent()
Todo:
Define visibility

Definition at line 594 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$defaultContentArray, TYPO3\CMS\IndexedSearch\Indexer\addSpacesToKeywordList(), TYPO3\CMS\IndexedSearch\Indexer\embracingTags(), TYPO3\CMS\Core\Utility\GeneralUtility\get_tag_attributes(), and TYPO3\CMS\IndexedSearch\Indexer\typoSearchTags().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ splitRegularContent()

TYPO3\CMS\IndexedSearch\Indexer::splitRegularContent (   $content)

Splits non-HTML content (from external files for instance)

Parameters
stringInput content (non-HTML) to index.
Returns
array Array of content, having the key "body" set (plus "title", "description" and "keywords", but empty)
See also
splitHTMLContent()
Todo:
Define visibility

Definition at line 1239 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$defaultContentArray.

◆ submit_grlist()

TYPO3\CMS\IndexedSearch\Indexer::submit_grlist (   $hash,
  $phash_x 
)

Stores gr_list in the database.

Parameters
integerSearch result record phash
integerActual phash of current content
Returns
void
See also
update_grlist()
Todo:
Define visibility

Definition at line 1505 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\IndexedSearch\Indexer\$hash.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFile_grlist(), TYPO3\CMS\IndexedSearch\Indexer\submitPage(), and TYPO3\CMS\IndexedSearch\Indexer\update_grlist().

◆ submit_section()

TYPO3\CMS\IndexedSearch\Indexer::submit_section (   $hash,
  $hash_t3 
)

Stores section $hash and $hash_t3 are the same for TYPO3 pages, but different when it is external files.

Parameters
integerphash of TYPO3 parent search result record
integerphash of the file indexation search record
Returns
void
Todo:
Define visibility

Definition at line 1527 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\$hash, and TYPO3\CMS\IndexedSearch\Indexer\getRootLineFields().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\submitFile_section(), and TYPO3\CMS\IndexedSearch\Indexer\submitPage().

◆ submitFile_grlist()

TYPO3\CMS\IndexedSearch\Indexer::submitFile_grlist (   $hash)

Stores file gr_list for a file IF it does not exist already

Parameters
integerphash value of file
Returns
void
Todo:
Define visibility

Definition at line 1649 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\$hash, and TYPO3\CMS\IndexedSearch\Indexer\submit_grlist().

◆ submitFile_section()

TYPO3\CMS\IndexedSearch\Indexer::submitFile_section (   $hash)

Stores file section for a file IF it does not exist

Parameters
integerphash value of file
Returns
void
Todo:
Define visibility

Definition at line 1666 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\$hash, and TYPO3\CMS\IndexedSearch\Indexer\submit_section().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ submitFilePage()

TYPO3\CMS\IndexedSearch\Indexer::submitFilePage (   $hash,
  $file,
  $subinfo,
  $ext,
  $mtime,
  $ctime,
  $size,
  $content_md5h,
  $contentParts 
)

Updates db with information about the file

Parameters
arrayArray with phash and phash_grouping keys for file
stringFile name
arrayArray of "cHashParams" for files: This is for instance the page index for a PDF file (other document types it will be a zero)
stringFile extension determining the type of media.
integerModification time of file.
integerCreation time of file.
integerSize of file in bytes
integerContent HASH value.
arrayStandard content array (using only title and body for a file)
Returns
void
Todo:
Define visibility

Definition at line 1580 of file Indexer.php.

References TYPO3\CMS\IndexedSearch\Indexer\$content_md5h, TYPO3\CMS\IndexedSearch\Indexer\$contentParts, $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\$hash, TYPO3\CMS\IndexedSearch\Indexer\bodyDescription(), and TYPO3\CMS\IndexedSearch\Indexer\removeOldIndexedFiles().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument().

◆ submitPage()

TYPO3\CMS\IndexedSearch\Indexer::submitPage ( )

◆ submitWords()

TYPO3\CMS\IndexedSearch\Indexer::submitWords (   $wordList,
  $phash 
)

Submits RELATIONS between words and phash

Parameters
arrayWord list array
integerphash value
Returns
void
Todo:
Define visibility

Definition at line 1994 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ typoSearchTags()

TYPO3\CMS\IndexedSearch\Indexer::typoSearchTags ( $body)

Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.

Parameters
stringHTML Content, passed by reference
Returns
boolean Returns TRUE if a TYPOSEARCH_ tag was found, otherwise FALSE.
Todo:
Define visibility

Definition at line 717 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent().

◆ update_grlist()

TYPO3\CMS\IndexedSearch\Indexer::update_grlist (   $phash,
  $phash_x 
)

Check if an grlist-entry for this hash exists and if not so, write one.

Parameters
integerphash of the search result that should be found
integerThe real phash of the current content. The two values are different when a page with userlogin turns out to contain the exact same content as another already indexed version of the page; This is the whole reason for the grlist table in fact...
Returns
void
See also
submit_grlist()
Todo:
Define visibility

Definition at line 1816 of file Indexer.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Indexer\log_setTSlogMessage(), and TYPO3\CMS\IndexedSearch\Indexer\submit_grlist().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateParsetime()

TYPO3\CMS\IndexedSearch\Indexer::updateParsetime (   $phash,
  $parsetime 
)

Update parsetime for phash row.

Parameters
integerphash value.
integerParsetime value to set.
Returns
void
Todo:
Define visibility

Definition at line 1870 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateRootline()

TYPO3\CMS\IndexedSearch\Indexer::updateRootline ( )

Update section rootline for the page

Returns
void
Todo:
Define visibility

Definition at line 1885 of file Indexer.php.

References $GLOBALS, and TYPO3\CMS\IndexedSearch\Indexer\getRootLineFields().

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateSetId()

TYPO3\CMS\IndexedSearch\Indexer::updateSetId (   $phash)

Update SetID of the index_phash record.

Parameters
integerphash value
Returns
void
Todo:
Define visibility

Definition at line 1853 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

◆ updateTstamp()

TYPO3\CMS\IndexedSearch\Indexer::updateTstamp (   $phash,
  $mtime = 0 
)

Update tstamp for a phash row.

Parameters
integerphash value
integerIf set, update the mtime field to this value.
Returns
void
Todo:
Define visibility

Definition at line 1834 of file Indexer.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\checkMtimeTstamp(), TYPO3\CMS\IndexedSearch\Indexer\indexRegularDocument(), and TYPO3\CMS\IndexedSearch\Indexer\indexTypo3PageContent().

Member Data Documentation

◆ $cHashParams

TYPO3\CMS\IndexedSearch\Indexer::$cHashParams = array()
Todo:
Define visibility

Definition at line 164 of file Indexer.php.

◆ $conf

TYPO3\CMS\IndexedSearch\Indexer::$conf = array()
Todo:
Define visibility

Definition at line 118 of file Indexer.php.

◆ $content_md5h

TYPO3\CMS\IndexedSearch\Indexer::$content_md5h = ''

◆ $contentParts

TYPO3\CMS\IndexedSearch\Indexer::$contentParts = array()

◆ $crawlerActive

TYPO3\CMS\IndexedSearch\Indexer::$crawlerActive = FALSE
Todo:
Define visibility

Definition at line 91 of file Indexer.php.

◆ $csObj

TYPO3\CMS\IndexedSearch\Indexer::$csObj

Definition at line 199 of file Indexer.php.

◆ $defaultContentArray

TYPO3\CMS\IndexedSearch\Indexer::$defaultContentArray
Initial value:
= array(
'title' => '',
'description' => '',
'keywords' => '',
'body' => ''
)
Todo:
Define visibility

Definition at line 98 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\splitHTMLContent(), and TYPO3\CMS\IndexedSearch\Indexer\splitRegularContent().

◆ $defaultGrList

TYPO3\CMS\IndexedSearch\Indexer::$defaultGrList = '0,-1'
Todo:
Define visibility

Definition at line 61 of file Indexer.php.

◆ $enableMetaphoneSearch

TYPO3\CMS\IndexedSearch\Indexer::$enableMetaphoneSearch = FALSE
Todo:
Define visibility

Definition at line 180 of file Indexer.php.

◆ $excludeSections

TYPO3\CMS\IndexedSearch\Indexer::$excludeSections = 'script,style'
Todo:
Define visibility

Definition at line 48 of file Indexer.php.

◆ $external_parsers

TYPO3\CMS\IndexedSearch\Indexer::$external_parsers = array()
Todo:
Define visibility

Definition at line 54 of file Indexer.php.

◆ $externalFileCounter

TYPO3\CMS\IndexedSearch\Indexer::$externalFileCounter = 0
Todo:
Define visibility

Definition at line 113 of file Indexer.php.

◆ $file_phash_arr

TYPO3\CMS\IndexedSearch\Indexer::$file_phash_arr = array()
Todo:
Define visibility

Definition at line 136 of file Indexer.php.

◆ $flagBitMask

TYPO3\CMS\IndexedSearch\Indexer::$flagBitMask
Todo:
Define visibility

Definition at line 220 of file Indexer.php.

◆ $forceIndexing

TYPO3\CMS\IndexedSearch\Indexer::$forceIndexing = FALSE
Todo:
Define visibility

Definition at line 85 of file Indexer.php.

◆ $freqMax

TYPO3\CMS\IndexedSearch\Indexer::$freqMax = 0.1
Todo:
Define visibility

Definition at line 175 of file Indexer.php.

◆ $freqRange

TYPO3\CMS\IndexedSearch\Indexer::$freqRange = 32000
Todo:
Define visibility

Definition at line 170 of file Indexer.php.

◆ $hash

◆ $indexerConfig

TYPO3\CMS\IndexedSearch\Indexer::$indexerConfig = array()
Todo:
Define visibility

Definition at line 124 of file Indexer.php.

Referenced by TYPO3\CMS\IndexedSearch\Indexer\hook_indexContent().

◆ $indexExternalUrl_content

TYPO3\CMS\IndexedSearch\Indexer::$indexExternalUrl_content = ''
Todo:
Define visibility

Definition at line 159 of file Indexer.php.

◆ $internal_log

TYPO3\CMS\IndexedSearch\Indexer::$internal_log = array()
Todo:
Define visibility

Definition at line 153 of file Indexer.php.

◆ $lexerObj

TYPO3\CMS\IndexedSearch\Indexer::$lexerObj

Definition at line 215 of file Indexer.php.

◆ $maxExternalFiles

TYPO3\CMS\IndexedSearch\Indexer::$maxExternalFiles = 0
Todo:
Define visibility

Definition at line 79 of file Indexer.php.

◆ $metaphoneContent

TYPO3\CMS\IndexedSearch\Indexer::$metaphoneContent = ''
Todo:
Define visibility

Definition at line 190 of file Indexer.php.

◆ $metaphoneObj

TYPO3\CMS\IndexedSearch\Indexer::$metaphoneObj

Definition at line 207 of file Indexer.php.

◆ $reasons

TYPO3\CMS\IndexedSearch\Indexer::$reasons
Initial value:
= array(
-1 => 'mtime matched the document, so no changes detected and no content updated',
-2 => 'The minimum age was not exceeded',
1 => 'The configured max-age was exceeded for the document and thus it\'s indexed.',
2 => 'The minimum age was exceed and mtime was set and the mtime was different, so the page was indexed.',
3 => 'The minimum age was exceed, but mtime was not set, so the page was indexed.',
4 => 'Page has never been indexed (is not represented in the index_phash table).'
)
Todo:
Define visibility

Definition at line 35 of file Indexer.php.

◆ $storeMetaphoneInfoAsWords

TYPO3\CMS\IndexedSearch\Indexer::$storeMetaphoneInfoAsWords
Todo:
Define visibility

Definition at line 185 of file Indexer.php.

◆ $tstamp_maxAge

TYPO3\CMS\IndexedSearch\Indexer::$tstamp_maxAge = 0
Todo:
Define visibility

Definition at line 67 of file Indexer.php.

◆ $tstamp_minAge

TYPO3\CMS\IndexedSearch\Indexer::$tstamp_minAge = 0
Todo:
Define visibility

Definition at line 73 of file Indexer.php.

◆ $wordcount

TYPO3\CMS\IndexedSearch\Indexer::$wordcount = 0
Todo:
Define visibility

Definition at line 108 of file Indexer.php.