TYPO3CMS  8
 All Classes Namespaces Files Functions Variables Pages
Indexer Class Reference

Public Member Functions

 __construct ()
 
 hook_indexContent (&$pObj)
 
 backend_initIndexer ($id, $type, $sys_language_uid, $MP, $uidRL, $cHash_array=[], $createCHash=false)
 
 backend_setFreeIndexUid ($freeIndexUid, $freeIndexSetId=0)
 
 backend_indexAsTYPO3Page ($title, $keywords, $description, $content, $charset, $mtime, $crdate=0, $recordUid=0)
 
 init ()
 
 initializeExternalParsers ()
 
 indexTypo3PageContent ()
 
 splitHTMLContent ($content)
 
 getHTMLcharset ($content)
 
 convertHTMLToUtf8 ($content, $charset= '')
 
 embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList)
 
 typoSearchTags (&$body)
 
 extractLinks ($content)
 
 extractHyperLinks ($html)
 
 extractBaseHref ($html)
 
 indexExternalUrl ($externalUrl)
 
 getUrlHeaders ($url)
 
 indexRegularDocument ($file, $force=false, $contentTmpFile= '', $altExtension= '')
 
 readFileContent ($fileExtension, $absoluteFileName, $sectionPointer)
 
 fileContentParts ($ext, $absFile)
 
 splitRegularContent ($content)
 
 charsetEntity2utf8 (&$contentArr, $charset)
 
 processWordsInArrays ($contentArr)
 
 bodyDescription ($contentArr)
 
 indexAnalyze ($content)
 
 analyzeHeaderinfo (&$retArr, $content, $key, $offset)
 
 analyzeBody (&$retArr, $content)
 
 metaphone ($word, $returnRawMetaphoneValue=false)
 
 submitPage ()
 
 submit_grlist ($hash, $phash_x)
 
 submit_section ($hash, $hash_t3)
 
 removeOldIndexedPages ($phash)
 
 submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts)
 
 submitFile_grlist ($hash)
 
 submitFile_section ($hash)
 
 removeOldIndexedFiles ($phash)
 
 checkMtimeTstamp ($mtime, $phash)
 
 checkContentHash ()
 
 checkExternalDocContentHash ($hashGr, $content_md5h)
 
 is_grlist_set ($phash_x)
 
 update_grlist ($phash, $phash_x)
 
 updateTstamp ($phash, $mtime=0)
 
 updateSetId ($phash)
 
 updateParsetime ($phash, $parsetime)
 
 updateRootline ()
 
 getRootLineFields (array &$fieldArray)
 
 includeCrawlerClass ()
 
 checkWordList ($wordListArray)
 
 submitWords ($wordList, $phash)
 
 freqMap ($freq)
 
 setT3Hashes ()
 
 setExtHashes ($file, $subinfo=[])
 
 log_push ($msg, $key)
 
 log_pull ()
 
 log_setTSlogMessage ($msg, $errorNum=0)
 

Public Attributes

 $reasons
 
 $excludeSections = 'script,style'
 
 $external_parsers = []
 
 $defaultGrList = '0,-1'
 
 $tstamp_maxAge = 0
 
 $tstamp_minAge = 0
 
 $maxExternalFiles = 0
 
 $forceIndexing = false
 
 $crawlerActive = false
 
 $defaultContentArray
 
 $wordcount = 0
 
 $externalFileCounter = 0
 
 $conf = []
 
 $indexerConfig = []
 
 $hash = []
 
 $file_phash_arr = []
 
 $contentParts = []
 
 $content_md5h = ''
 
 $internal_log = []
 
 $indexExternalUrl_content = ''
 
 $cHashParams = []
 
 $freqRange = 32000
 
 $freqMax = 0.1
 
 $enableMetaphoneSearch = false
 
 $storeMetaphoneInfoAsWords
 
 $metaphoneContent = ''
 
 $csObj
 
 $metaphoneObj
 
 $lexerObj
 
 $flagBitMask
 

Protected Member Functions

 createLocalPath ($sourcePath)
 
 createLocalPathFromT3vars ($sourcePath)
 
 createLocalPathUsingDomainURL ($sourcePath)
 
 createLocalPathUsingAbsRefPrefix ($sourcePath)
 
 createLocalPathFromAbsoluteURL ($sourcePath)
 
 createLocalPathFromRelativeURL ($sourcePath)
 
 addSpacesToKeywordList ($keywordList)
 

Static Protected Member Functions

static isRelativeURL ($url)
 
static isAllowedLocalFile ($filePath)
 

Protected Attributes

 $timeTracker
 

Detailed Description

Indexing class for TYPO3 frontend

Definition at line 28 of file indexed_search/Classes/Indexer.php.

Constructor & Destructor Documentation

__construct ( )

Indexer constructor.

Definition at line 241 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\makeInstance().

Member Function Documentation

addSpacesToKeywordList (   $keywordList)
protected

Makes sure that keywords are space-separated. This is impotant for their proper displaying as a part of fulltext index.

Parameters
string$keywordList
Returns
string
See also
http://forge.typo3.org/issues/14959

Definition at line 2339 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\trimExplode().

Referenced by Indexer\splitHTMLContent().

analyzeBody ( $retArr,
  $content 
)

Calculates relevant information for bodycontent

Parameters
array$retArrIndex array, passed by reference
array$contentStandard content array
Returns
void

Definition at line 1388 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\md5inthash(), and Indexer\metaphone().

Referenced by Indexer\indexAnalyze().

analyzeHeaderinfo ( $retArr,
  $content,
  $key,
  $offset 
)

Calculates relevant information for headercontent

Parameters
array$retArrIndex array, passed by reference
array$contentStandard content array
string$keyKey from standard content array
int$offsetBit-wise priority to type
Returns
void

Definition at line 1357 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\md5inthash(), and Indexer\metaphone().

Referenced by Indexer\indexAnalyze().

backend_indexAsTYPO3Page (   $title,
  $keywords,
  $description,
  $content,
  $charset,
  $mtime,
  $crdate = 0,
  $recordUid = 0 
)

Indexing records as the content of a TYPO3 page.

Parameters
string$titleTitle equivalent
string$keywordsKeywords equivalent
string$descriptionDescription equivalent
string$contentThe main content to index
string$charsetThe charset of the title, keyword, description and body-content. MUST BE VALID, otherwise nothing is indexed!
int$mtimeLast modification time, in seconds
int$crdateThe creation date of the content, in seconds
int$recordUidThe record UID that the content comes from (for registration with the indexed rows)
Returns
void

Definition at line 423 of file indexed_search/Classes/Indexer.php.

References Indexer\indexTypo3PageContent().

backend_initIndexer (   $id,
  $type,
  $sys_language_uid,
  $MP,
  $uidRL,
  $cHash_array = [],
  $createCHash = false 
)

Initializing the "combined ID" of the page (phash) being indexed (or for which external media is attached)

Parameters
int$idThe page uid, &id=
int$typeThe page type, &type=
int$sys_language_uidsys_language uid, typically &L=
string$MPThe MP variable (Mount Points), &MP=
array$uidRLRootline array of only UIDs.
array$cHash_arrayArray of GET variables to register with this indexing
bool$createCHashIf set, calculates a cHash value from the $cHash_array. Probably you will not do that since such cases are indexed through the frontend and the idea of this interface is to index non-cacheable pages from the backend!
Returns
void

Definition at line 354 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\implodeArrayForUrl(), Indexer\init(), and GeneralUtility\makeInstance().

backend_setFreeIndexUid (   $freeIndexUid,
  $freeIndexSetId = 0 
)

Sets the free-index uid. Can be called right after backend_initIndexer()

Parameters
int$freeIndexUidFree index UID
int$freeIndexSetIdSet id - an integer identifying the "set" of indexing operations.
Returns
void

Definition at line 404 of file indexed_search/Classes/Indexer.php.

bodyDescription (   $contentArr)

Extracts the sample description text from the content array.

Parameters
array$contentArrContent array
Returns
string Description string

Definition at line 1320 of file indexed_search/Classes/Indexer.php.

References MathUtility\forceIntegerInRange().

Referenced by Indexer\submitFilePage(), and Indexer\submitPage().

charsetEntity2utf8 ( $contentArr,
  $charset 
)

Convert character set and HTML entities in the value of input content array keys

Parameters
array$contentArrStandard content array
string$charsetCharset of the input content (converted to utf-8)
Returns
void

Definition at line 1280 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\indexTypo3PageContent().

checkContentHash ( )

Check content hash in phash table

Returns
mixed Returns TRUE if the page needs to be indexed (that is, there was no result), otherwise the phash value (in an array) of the phash record to which the grlist_record should be related!

Definition at line 1863 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexTypo3PageContent().

checkExternalDocContentHash (   $hashGr,
  $content_md5h 
)

Check content hash for external documents Returns TRUE if the document needs to be indexed (that is, there was no result)

Parameters
int$hashGrphash value to check (phash_grouping)
int$content_md5hContent hash to check
Returns
bool Returns TRUE if the document needs to be indexed (that is, there was no result)

Definition at line 1897 of file indexed_search/Classes/Indexer.php.

References Indexer\$content_md5h, IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexRegularDocument().

checkMtimeTstamp (   $mtime,
  $phash 
)

Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to be indexed

Parameters
int$mtimemtime value to test against limits and indexed page (usually this is the mtime of the cached document)
int$phash"phash" used to select any already indexed page to see what its mtime is.
Returns
int Result integer: Generally: <0 = No indexing, >0 = Do indexing (see $this->reasons): -2) Min age was NOT exceeded and so indexing cannot occur. -1) mtime matched so no need to reindex page. 0) N/A 1) Max age exceeded, page must be indexed again. 2) mtime of indexed page doesn't match mtime given for current content and we must index page. 3) No mtime was set, so we will index... 4) No indexed page found, so of course we will index.

Definition at line 1800 of file indexed_search/Classes/Indexer.php.

References $GLOBALS, IndexedSearchUtility\isTableUsed(), Indexer\log_setTSlogMessage(), GeneralUtility\makeInstance(), and Indexer\updateTstamp().

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

checkWordList (   $wordListArray)

Adds new words to db

Parameters
array$wordListArrayWord List array (where each word has information about position etc).
Returns
void

Definition at line 2119 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), Indexer\log_setTSlogMessage(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

convertHTMLToUtf8 (   $content,
  $charset = '' 
)

Converts a HTML document to utf-8

Parameters
string$contentHTML content, any charset
string$charsetOptional charset (otherwise extracted from HTML)
Returns
string Converted HTML

Definition at line 682 of file indexed_search/Classes/Indexer.php.

References Indexer\getHTMLcharset().

createLocalPath (   $sourcePath)
protected

Checks if the file is local

Parameters
string$sourcePath
Returns
string Absolute path to file if file is local, else empty string

Definition at line 954 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\extractHyperLinks().

createLocalPathFromAbsoluteURL (   $sourcePath)
protected

Attempts to create a local file path from the absolute URL without schema.

Parameters
string$sourcePath
Returns
string

Definition at line 1049 of file indexed_search/Classes/Indexer.php.

createLocalPathFromRelativeURL (   $sourcePath)
protected

Attempts to create a local file path from the relative URL.

Parameters
string$sourcePath
Returns
string

Definition at line 1068 of file indexed_search/Classes/Indexer.php.

createLocalPathFromT3vars (   $sourcePath)
protected

Attempts to create a local file path from T3VARs. This is useful for various download extensions that hide actual file name but still want the file to be indexed.

Parameters
string$sourcePath
Returns
string

Definition at line 981 of file indexed_search/Classes/Indexer.php.

References $GLOBALS, and GeneralUtility\shortMD5().

createLocalPathUsingAbsRefPrefix (   $sourcePath)
protected

Attempts to create a local file path by matching absRefPrefix. This requires TSFE. If TSFE is missing, this function does nothing.

Parameters
string$sourcePath
Returns
string

Definition at line 1025 of file indexed_search/Classes/Indexer.php.

References $GLOBALS.

createLocalPathUsingDomainURL (   $sourcePath)
protected

Attempts to create a local file path by matching a current request URL.

Parameters
string$sourcePath
Returns
string

Definition at line 1003 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\getIndpEnv().

embracingTags (   $string,
  $tagName,
$tagContent,
$stringAfter,
$paramList 
)

Finds first occurrence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns FALSE if no match found. ie. useful for finding <title> of document or removing <script>-sections

Parameters
string$stringString to search in
string$tagNameTag name, eg. "script
string$tagContentPassed by reference: Content inside found tag
string$stringAfterPassed by reference: Content after found tag
string$paramListPassed by reference: Attributes of the found tag.
Returns
bool Returns FALSE if tag was not found, otherwise TRUE.

Definition at line 707 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\splitHTMLContent().

extractBaseHref (   $html)

Extracts the "base href" from content string.

Parameters
string$htmlContent to analyze
Returns
string The base href or an empty string if not found

Definition at line 871 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\makeInstance().

extractHyperLinks (   $html)

Extracts all links to external documents from the HTML content string

Parameters
string$html
Returns
array Array of hyperlinks (keys: tag, href, localPath (empty if not local))
See also
extractLinks()

Definition at line 842 of file indexed_search/Classes/Indexer.php.

References Indexer\createLocalPath(), and GeneralUtility\makeInstance().

Referenced by Indexer\extractLinks().

extractLinks (   $content)

Extract links (hrefs) from HTML content and if indexable media is found, it is indexed.

Parameters
string$contentHTML content
Returns
void

Definition at line 764 of file indexed_search/Classes/Indexer.php.

References Indexer\$conf, Indexer\extractHyperLinks(), GeneralUtility\getFileAbsFileName(), Indexer\indexExternalUrl(), Indexer\indexRegularDocument(), GeneralUtility\isAllowedAbsPath(), Indexer\log_setTSlogMessage(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexTypo3PageContent().

fileContentParts (   $ext,
  $absFile 
)

Creates an array with pointers to divisions of document.

Parameters
string$extFile extension
string$absFileAbsolute filename (must exist and be validated OK before calling function)
Returns
array Array of pointers to sections that the document should be divided into

Definition at line 1244 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\indexRegularDocument().

freqMap (   $freq)

maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.

Parameters
float$freqFrequency
Returns
int Frequency in range.

Definition at line 2230 of file indexed_search/Classes/Indexer.php.

References Indexer\$freqRange.

Referenced by Indexer\submitWords().

getHTMLcharset (   $content)

Extract the charset value from HTML meta tag.

Parameters
string$contentHTML content
Returns
string The charset value if found.

Definition at line 666 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\convertHTMLToUtf8().

getRootLineFields ( array &  $fieldArray)

Adding values for root-line fields. rl0, rl1 and rl2 are standard. A hook might add more.

Parameters
array$fieldArrayField array, passed by reference
Returns
void

Definition at line 2084 of file indexed_search/Classes/Indexer.php.

References $GLOBALS.

Referenced by Indexer\submit_section(), and Indexer\updateRootline().

getUrlHeaders (   $url)

Getting HTTP request headers of URL

Parameters
string$urlThe URL
Returns
mixed If no answer, returns FALSE. Otherwise an array where HTTP headers are keys

Definition at line 929 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\trimExplode().

Referenced by Indexer\indexExternalUrl().

hook_indexContent ( $pObj)

Parent Object (TSFE) Initialization

Parameters
TypoScriptFrontendController$pObjParent Object, passed by reference
Returns
void

Definition at line 252 of file indexed_search/Classes/Indexer.php.

References $GLOBALS, Indexer\$indexerConfig, Indexer\indexTypo3PageContent(), Indexer\init(), Indexer\log_pull(), Indexer\log_push(), and Indexer\log_setTSlogMessage().

includeCrawlerClass ( )

Includes the crawler class

Returns
void
Deprecated:
since TYPO3 v8, will be removed in TYPO3 v9, autoloader is taking care of that functionality

Definition at line 2102 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\logDeprecatedFunction().

indexAnalyze (   $content)

Analyzes content to use for indexing,

Parameters
array$contentStandard content array: an array with the keys title,keywords,description and body, which all contain an array of words.
Returns
array Index Array (whatever that is...)

Definition at line 1338 of file indexed_search/Classes/Indexer.php.

References Indexer\analyzeBody(), and Indexer\analyzeHeaderinfo().

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

indexExternalUrl (   $externalUrl)

Index External URLs HTML content

Parameters
string$externalUrlURL, eg. "http://typo3.org/
Returns
void
See also
indexRegularDocument()

Definition at line 903 of file indexed_search/Classes/Indexer.php.

References Indexer\getUrlHeaders(), Indexer\indexRegularDocument(), GeneralUtility\tempnam(), and GeneralUtility\writeFile().

Referenced by Indexer\extractLinks().

indexRegularDocument (   $file,
  $force = false,
  $contentTmpFile = '',
  $altExtension = '' 
)

Indexing a regular document given as $file (relative to PATH_site, local file)

Parameters
string$fileRelative Filename, relative to PATH_site. It can also be an absolute path as long as it is inside the lockRootPath (validated with ::isAbsPath()). Finally, if $contentTmpFile is set, this value can be anything, most likely a URL
bool$forceIf set, indexing is forced (despite content hashes, mtime etc).
string$contentTmpFileTemporary file with the content to read it from (instead of $file). Used when the $file is a URL.
string$altExtensionFile extension for temporary file.
Returns
void

Definition at line 1120 of file indexed_search/Classes/Indexer.php.

References Indexer\$content_md5h, Indexer\$contentParts, Indexer\checkExternalDocContentHash(), Indexer\checkMtimeTstamp(), Indexer\checkWordList(), Indexer\fileContentParts(), GeneralUtility\getFileAbsFileName(), Indexer\indexAnalyze(), GeneralUtility\isAbsPath(), GeneralUtility\isAllowedAbsPath(), IndexedSearchUtility\isTableUsed(), Indexer\log_pull(), Indexer\log_push(), Indexer\log_setTSlogMessage(), IndexedSearchUtility\md5inthash(), GeneralUtility\milliseconds(), Indexer\processWordsInArrays(), Indexer\readFileContent(), Indexer\setExtHashes(), Indexer\submitFile_section(), Indexer\submitFilePage(), Indexer\submitWords(), Indexer\updateParsetime(), and Indexer\updateTstamp().

Referenced by Indexer\extractLinks(), and Indexer\indexExternalUrl().

initializeExternalParsers ( )

Initialize external parsers

Returns
void private
See also
init()

Definition at line 512 of file indexed_search/Classes/Indexer.php.

References $GLOBALS, and GeneralUtility\getUserObj().

Referenced by Indexer\init().

is_grlist_set (   $phash_x)

Checks if a grlist record has been set for the phash value input (looking at the "real" phash of the current content, not the linked-to phash of the common search result page)

Parameters
int$phash_xPhash integer to test.
Returns
bool

Definition at line 1923 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexTypo3PageContent().

static isAllowedLocalFile (   $filePath)
staticprotected

Checks if the path points to the file inside the web site

Parameters
string$filePath
Returns
bool

Definition at line 1098 of file indexed_search/Classes/Indexer.php.

References GeneralUtility\resolveBackPath().

static isRelativeURL (   $url)
staticprotected

Checks if URL is relative.

Parameters
string$url
Returns
bool

Definition at line 1086 of file indexed_search/Classes/Indexer.php.

log_pull ( )

Pull function wrapper for TT logging

Returns
void

Definition at line 2313 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\hook_indexContent(), Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

log_push (   $msg,
  $key 
)

Push function wrapper for TT logging

Parameters
string$msgTitle to set
string$keyKey (?)
Returns
void

Definition at line 2303 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\hook_indexContent(), Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

log_setTSlogMessage (   $msg,
  $errorNum = 0 
)

Set log message function wrapper for TT logging

Parameters
string$msgMessage to set
int$errorNumError number
Returns
void

Definition at line 2325 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\checkMtimeTstamp(), Indexer\checkWordList(), Indexer\extractLinks(), Indexer\hook_indexContent(), Indexer\indexRegularDocument(), Indexer\indexTypo3PageContent(), and Indexer\update_grlist().

metaphone (   $word,
  $returnRawMetaphoneValue = false 
)

Creating metaphone based hash from input word

Parameters
string$wordWord to convert
bool$returnRawMetaphoneValueIf set, returns the raw metaphone value (not hashed)
Returns
mixed Metaphone hash integer (or raw value, string)

Definition at line 1419 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\md5inthash().

Referenced by Indexer\analyzeBody(), and Indexer\analyzeHeaderinfo().

processWordsInArrays (   $contentArr)

Processing words in the array from split*Content -functions

Parameters
array$contentArrArray of content to index, see splitHTMLContent() and splitRegularContent()
Returns
array Content input array modified so each key is not a unique array of words

Definition at line 1300 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

readFileContent (   $fileExtension,
  $absoluteFileName,
  $sectionPointer 
)

Reads the content of an external file being indexed. The content from the external parser MUST be returned in utf-8!

Parameters
string$fileExtensionFile extension, eg. "pdf", "doc" etc.
string$absoluteFileNameAbsolute filename of file (must exist and be validated OK before calling function)
string$sectionPointerPointer to section (zero for all other than PDF which will have an indication of pages into which the document should be splitted.)
Returns
array Standard content array (title, description, keywords, body keys)

Definition at line 1227 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\indexRegularDocument().

removeOldIndexedFiles (   $phash)

Removes records for the indexed page, $phash

Parameters
int$phashphash value to flush
Returns
void

Definition at line 1774 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\submitFilePage().

removeOldIndexedPages (   $phash)

Removes records for the indexed page, $phash

Parameters
int$phashphash value to flush
Returns
void

Definition at line 1577 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\submitPage().

setExtHashes (   $file,
  $subinfo = [] 
)

Get search hash, external files

Parameters
string$fileFile name / path which identifies it on the server
array$subinfoAdditional content identifying the (subpart of) content. For instance; PDF files are divided into groups of pages for indexing.
Returns
array Array with "phash_grouping" and "phash" inside.

Definition at line 2276 of file indexed_search/Classes/Indexer.php.

References Indexer\$hash, and IndexedSearchUtility\md5inthash().

Referenced by Indexer\indexRegularDocument().

setT3Hashes ( )

Get search hash, T3 pages

Returns
void

Definition at line 2252 of file indexed_search/Classes/Indexer.php.

References Indexer\$cHashParams, and IndexedSearchUtility\md5inthash().

Referenced by Indexer\init().

splitHTMLContent (   $content)

Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.

Parameters
string$contentHTML content to index. To some degree expected to be made by TYPO3 (ei. splitting the header by ":")
Returns
array Array of content, having keys "title", "body", "keywords" and "description" set.
See also
splitRegularContent()

Definition at line 615 of file indexed_search/Classes/Indexer.php.

References Indexer\$defaultContentArray, Indexer\addSpacesToKeywordList(), Indexer\embracingTags(), GeneralUtility\get_tag_attributes(), and Indexer\typoSearchTags().

Referenced by Indexer\indexTypo3PageContent().

splitRegularContent (   $content)

Splits non-HTML content (from external files for instance)

Parameters
string$contentInput content (non-HTML) to index.
Returns
array Array of content, having the key "body" set (plus "title", "description" and "keywords", but empty)
See also
splitHTMLContent()

Definition at line 1261 of file indexed_search/Classes/Indexer.php.

References Indexer\$defaultContentArray.

submit_grlist (   $hash,
  $phash_x 
)

Stores gr_list in the database.

Parameters
int$hashSearch result record phash
int$phash_xActual phash of current content
Returns
void
See also
update_grlist()

Definition at line 1532 of file indexed_search/Classes/Indexer.php.

References $fields, Indexer\$hash, IndexedSearchUtility\isTableUsed(), GeneralUtility\makeInstance(), and IndexedSearchUtility\md5inthash().

Referenced by Indexer\submitFile_grlist(), Indexer\submitPage(), and Indexer\update_grlist().

submit_section (   $hash,
  $hash_t3 
)

Stores section $hash and $hash_t3 are the same for TYPO3 pages, but different when it is external files.

Parameters
int$hashphash of TYPO3 parent search result record
int$hash_t3phash of the file indexation search record
Returns
void

Definition at line 1556 of file indexed_search/Classes/Indexer.php.

References $fields, Indexer\$hash, Indexer\getRootLineFields(), IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\submitFile_section(), and Indexer\submitPage().

submitFile_grlist (   $hash)

Stores file gr_list for a file IF it does not exist already

Parameters
int$hashphash value of file
Returns
void

Definition at line 1692 of file indexed_search/Classes/Indexer.php.

References Indexer\$hash, IndexedSearchUtility\isTableUsed(), GeneralUtility\makeInstance(), IndexedSearchUtility\md5inthash(), and Indexer\submit_grlist().

submitFile_section (   $hash)

Stores file section for a file IF it does not exist

Parameters
int$hashphash value of file
Returns
void

Definition at line 1739 of file indexed_search/Classes/Indexer.php.

References Indexer\$hash, IndexedSearchUtility\isTableUsed(), GeneralUtility\makeInstance(), and Indexer\submit_section().

Referenced by Indexer\indexRegularDocument().

submitFilePage (   $hash,
  $file,
  $subinfo,
  $ext,
  $mtime,
  $ctime,
  $size,
  $content_md5h,
  $contentParts 
)

Updates db with information about the file

Parameters
array$hashArray with phash and phash_grouping keys for file
string$fileFile name
array$subinfoArray of "cHashParams" for files: This is for instance the page index for a PDF file (other document types it will be a zero)
string$extFile extension determining the type of media.
int$mtimeModification time of file.
int$ctimeCreation time of file.
int$sizeSize of file in bytes
int$content_md5hContent HASH value.
array$contentPartsStandard content array (using only title and body for a file)
Returns
void

Definition at line 1617 of file indexed_search/Classes/Indexer.php.

References Indexer\$content_md5h, Indexer\$contentParts, $fields, $GLOBALS, Indexer\$hash, Indexer\$metaphoneContent, Indexer\bodyDescription(), IndexedSearchUtility\isTableUsed(), GeneralUtility\makeInstance(), and Indexer\removeOldIndexedFiles().

Referenced by Indexer\indexRegularDocument().

submitWords (   $wordList,
  $phash 
)

Submits RELATIONS between words and phash

Parameters
array$wordListWord list array
int$phashphash value
Returns
void

Definition at line 2182 of file indexed_search/Classes/Indexer.php.

References $fields, Indexer\$flagBitMask, Indexer\freqMap(), IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

typoSearchTags ( $body)

Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.

Parameters
string$bodyHTML Content, passed by reference
Returns
bool Returns TRUE if a TYPOSEARCH_ tag was found, otherwise FALSE.

Definition at line 736 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\splitHTMLContent().

update_grlist (   $phash,
  $phash_x 
)

Check if an grlist-entry for this hash exists and if not so, write one.

Parameters
int$phashphash of the search result that should be found
int$phash_xThe real phash of the current content. The two values are different when a page with userlogin turns out to contain the exact same content as another already indexed version of the page; This is the whole reason for the grlist table in fact...
Returns
void
See also
submit_grlist()

Definition at line 1948 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), Indexer\log_setTSlogMessage(), GeneralUtility\makeInstance(), IndexedSearchUtility\md5inthash(), and Indexer\submit_grlist().

Referenced by Indexer\indexTypo3PageContent().

updateParsetime (   $phash,
  $parsetime 
)

Update parsetime for phash row.

Parameters
int$phashphash value.
int$parsetimeParsetime value to set.
Returns
void

Definition at line 2033 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

updateRootline ( )

Update section rootline for the page

Returns
void

Definition at line 2057 of file indexed_search/Classes/Indexer.php.

References Indexer\getRootLineFields(), IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexTypo3PageContent().

updateSetId (   $phash)

Update SetID of the index_phash record.

Parameters
int$phashphash value
Returns
void

Definition at line 2007 of file indexed_search/Classes/Indexer.php.

References IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\indexTypo3PageContent().

updateTstamp (   $phash,
  $mtime = 0 
)

Update tstamp for a phash row.

Parameters
int$phashphash value
int$mtimeIf set, update the mtime field to this value.
Returns
void

Definition at line 1976 of file indexed_search/Classes/Indexer.php.

References $GLOBALS, IndexedSearchUtility\isTableUsed(), and GeneralUtility\makeInstance().

Referenced by Indexer\checkMtimeTstamp(), Indexer\indexRegularDocument(), and Indexer\indexTypo3PageContent().

Member Data Documentation

$cHashParams = []

Definition at line 178 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\setT3Hashes().

$conf = []

Definition at line 126 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\extractLinks().

$contentParts = []
$crawlerActive = false

Definition at line 99 of file indexed_search/Classes/Indexer.php.

$csObj

Definition at line 212 of file indexed_search/Classes/Indexer.php.

$defaultContentArray
Initial value:
= [
'title' => ''

Definition at line 106 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\splitHTMLContent(), and Indexer\splitRegularContent().

$defaultGrList = '0,-1'

Definition at line 63 of file indexed_search/Classes/Indexer.php.

$enableMetaphoneSearch = false

Definition at line 195 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\init().

$excludeSections = 'script,style'

Definition at line 47 of file indexed_search/Classes/Indexer.php.

$external_parsers = []

Definition at line 54 of file indexed_search/Classes/Indexer.php.

$externalFileCounter = 0

Definition at line 121 of file indexed_search/Classes/Indexer.php.

$file_phash_arr = []

Definition at line 147 of file indexed_search/Classes/Indexer.php.

$flagBitMask

Definition at line 231 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\submitWords().

$forceIndexing = false

Definition at line 92 of file indexed_search/Classes/Indexer.php.

$freqMax = 0.1

Definition at line 190 of file indexed_search/Classes/Indexer.php.

$freqRange = 32000

Definition at line 185 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\freqMap().

$indexerConfig = []

Definition at line 133 of file indexed_search/Classes/Indexer.php.

Referenced by Indexer\hook_indexContent().

$indexExternalUrl_content = ''

Definition at line 173 of file indexed_search/Classes/Indexer.php.

$internal_log = []

Definition at line 166 of file indexed_search/Classes/Indexer.php.

$lexerObj

Definition at line 226 of file indexed_search/Classes/Indexer.php.

$maxExternalFiles = 0

Definition at line 85 of file indexed_search/Classes/Indexer.php.

$metaphoneContent = ''
$metaphoneObj

Definition at line 219 of file indexed_search/Classes/Indexer.php.

$reasons
Initial value:
= [
-1 => 'mtime matched the document, so no changes detected and no content updated'

Definition at line 33 of file indexed_search/Classes/Indexer.php.

$storeMetaphoneInfoAsWords

Definition at line 200 of file indexed_search/Classes/Indexer.php.

$timeTracker
protected

Definition at line 236 of file indexed_search/Classes/Indexer.php.

$tstamp_maxAge = 0

Definition at line 70 of file indexed_search/Classes/Indexer.php.

$tstamp_minAge = 0

Definition at line 78 of file indexed_search/Classes/Indexer.php.

$wordcount = 0

Definition at line 116 of file indexed_search/Classes/Indexer.php.