TYPO3 CMS  TYPO3_6-2
TYPO3\CMS\IndexedSearch\Hook\CrawlerHook Class Reference
Inheritance diagram for TYPO3\CMS\IndexedSearch\Hook\CrawlerHook:
tx_indexedsearch_crawler

Public Member Functions

 crawler_init (&$pObj)
 
 crawler_execute ($params, &$pObj)
 
 crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj)
 
 cleanUpOldRunningConfigurations ()
 
 checkUrl ($url, $urlLog, $baseUrl)
 
 indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId)
 
 indexSingleRecord ($r, $cfgRec, $rl=NULL)
 
 loadIndexerClass ()
 
 getUidRootLineForClosestTemplate ($id)
 
 generateNextIndexingTime ($cfgRec)
 
 checkDeniedSuburls ($url, $url_deny)
 
 addQueueEntryForHook ($cfgRec, $title)
 
 deleteFromIndex ($id)
 
 processCmdmap_preProcess ($command, $table, $id, $value, $pObj)
 
 processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, $pObj)
 

Public Attributes

 $secondsPerExternalUrl = 3
 
 $instanceCounter = 0
 
 $callBack = '&TYPO3\\CMS\\IndexedSearch\\Hook\\CrawlerHook'
 

Detailed Description

Crawler hook for indexed search. Works with the "crawler" extension

Author
Kasper Skårhøj kaspe.nosp@m.rYYY.nosp@m.Y@typ.nosp@m.o3.c.nosp@m.om

Definition at line 25 of file CrawlerHook.php.

Member Function Documentation

◆ addQueueEntryForHook()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::addQueueEntryForHook (   $cfgRec,
  $title 
)

Adding entry in queue for Hook

Parameters
arrayConfiguration record
stringTitle/URL
Returns
void
Todo:
Define visibility

Definition at line 645 of file CrawlerHook.php.

◆ checkDeniedSuburls()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::checkDeniedSuburls (   $url,
  $url_deny 
)

Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns TRUE.

Parameters
stringURL to test
stringString where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend)
Returns
boolean TRUE if there is a matching URL (hence, do not index!)
Todo:
Define visibility

Definition at line 625 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\isFirstPartOfStr(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ checkUrl()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::checkUrl (   $url,
  $urlLog,
  $baseUrl 
)

Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.

Parameters
stringURL string to check
arrayArray of already indexed URLs (input url is looked up here and must not exist already)
stringBase URL of the indexing process (input URL must be "inside" the base URL!)
Returns
string Returls the URL if OK, otherwise FALSE
Todo:
Define visibility

Definition at line 459 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\isFirstPartOfStr().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ cleanUpOldRunningConfigurations()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::cleanUpOldRunningConfigurations ( )

Look up all old index configurations which are finished and needs to be reset and done

Returns
void
Todo:
Define visibility

Definition at line 418 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_init().

◆ crawler_execute()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute (   $params,
$pObj 
)

Call back function for execution of a log element

Parameters
arrayParams from log element. Must contain $params['indexConfigUid']
objectParent object (tx_crawler lib)
Returns
array Result array
Todo:
Define visibility

Definition at line 165 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type1(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type2(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type4(), and TYPO3\CMS\Core\Utility\GeneralUtility\getUserObj().

◆ crawler_execute_type1()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type1 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing records from a table

Parameters
arrayIndexing Configuration Record
arraySession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
arrayParameters from the log queue.
objectParent object (from "crawler" extension!)
Returns
void
Todo:
Define visibility

Definition at line 225 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\Backend\Utility\BackendUtility\BEenableFields(), TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type2()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type2 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing files from fileadmin

Parameters
arrayIndexing Configuration Record
arraySession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
arrayParameters from the log queue.
objectParent object (from "crawler" extension!)
Returns
void
Todo:
Define visibility

Definition at line 270 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\Core\Utility\GeneralUtility\get_dirs(), TYPO3\CMS\Core\Utility\GeneralUtility\getAllFilesAndFoldersInPath(), TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), TYPO3\CMS\Core\Utility\GeneralUtility\isAbsPath(), TYPO3\CMS\Core\Utility\GeneralUtility\isAllowedAbsPath(), TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance(), TYPO3\CMS\Core\Utility\GeneralUtility\removePrefixPathFromList(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type3()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type3 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing External URLs

Parameters
arrayIndexing Configuration Record
arraySession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
arrayParameters from the log queue.
objectParent object (from "crawler" extension!)
Returns
void
Todo:
Define visibility

Definition at line 332 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\checkDeniedSuburls(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\checkUrl(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexExtUrl().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type4()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type4 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Page tree indexing type

Parameters
arrayIndexing Configuration Record
arraySession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
arrayParameters from the log queue.
objectParent object (from "crawler" extension!)
Returns
void
Todo:
Define visibility

Definition at line 373 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_init()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_init ( $pObj)

Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.

Parameters
objectParent object (tx_crawler lib)
Returns
void
Todo:
Define visibility

Definition at line 57 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\cleanUpOldRunningConfigurations(), TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\generateNextIndexingTime(), TYPO3\CMS\Core\Utility\GeneralUtility\getUserObj(), and TYPO3\CMS\Core\Utility\GeneralUtility\md5int().

◆ deleteFromIndex()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::deleteFromIndex (   $id)

Deletes all data stored by indexed search for a given page

Parameters
integerUid of the page to delete all pHash
Returns
void
Todo:
Define visibility

Definition at line 662 of file CrawlerHook.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\processCmdmap_preProcess(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\processDatamap_afterDatabaseOperations().

◆ generateNextIndexingTime()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::generateNextIndexingTime (   $cfgRec)

Generate the unix time stamp for next visit.

Parameters
arrayIndex configuration record
Returns
integer The next time stamp
Todo:
Define visibility

Definition at line 598 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_init().

◆ getUidRootLineForClosestTemplate()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::getUidRootLineForClosestTemplate (   $id)

Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser

Parameters
integerThe page id to traverse rootline back from
Returns
array Array where the root lines uid values are found.
Todo:
Define visibility

Definition at line 572 of file CrawlerHook.php.

References $TYPO3_CONF_VARS, and TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type1(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type2(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

◆ indexExtUrl()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::indexExtUrl (   $url,
  $pageId,
  $rl,
  $cfgUid,
  $setId 
)

Indexing External URL

Parameters
stringURL, http://....
integerPage id to relate indexing to.
arrayRootline array to relate indexing to
integerConfiguration UID
integerSet ID value
Returns
array URLs found on this page
Todo:
Define visibility

Definition at line 482 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance(), and TYPO3\CMS\Core\Utility\GeneralUtility\resolveBackPath().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ indexSingleRecord()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::indexSingleRecord (   $r,
  $cfgRec,
  $rl = NULL 
)

◆ loadIndexerClass()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::loadIndexerClass ( )

Include indexer class.

Returns
void
Todo:
Define visibility
Deprecated:
since 6.2 will be removed two version later. Rely on autoloading of the indexer class.

Definition at line 560 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\logDeprecatedFunction().

◆ processCmdmap_preProcess()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::processCmdmap_preProcess (   $command,
  $table,
  $id,
  $value,
  $pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters
stringTCEmain command
stringTable name
stringRecord ID. If new record its a string pointing to index inside ::substNEWwithIDs
mixedTarget value (ignored)
objectReference to tcemain calling object
Returns
void
Todo:
Define visibility

Definition at line 694 of file CrawlerHook.php.

References TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\deleteFromIndex().

◆ processDatamap_afterDatabaseOperations()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::processDatamap_afterDatabaseOperations (   $status,
  $table,
  $id,
  $fieldArray,
  $pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters
stringStatus "new" or "update
stringTable name
stringRecord ID. If new record its a string pointing to index inside ::substNEWwithIDs
arrayField array of updated fields in the operation
objectReference to tcemain calling object
Returns
void
Todo:
Define visibility

Definition at line 712 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\deleteFromIndex(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

Member Data Documentation

◆ $callBack

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$callBack = '&TYPO3\\CMS\\IndexedSearch\\Hook\\CrawlerHook'
Todo:
Define visibility

Definition at line 45 of file CrawlerHook.php.

◆ $instanceCounter

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$instanceCounter = 0
Todo:
Define visibility

Definition at line 38 of file CrawlerHook.php.

◆ $secondsPerExternalUrl

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$secondsPerExternalUrl = 3
Todo:
Define visibility

Definition at line 31 of file CrawlerHook.php.