TYPO3 CMS  TYPO3_7-6
TYPO3\CMS\IndexedSearch\Hook\CrawlerHook Class Reference

Public Member Functions

 __construct ()
 
 crawler_init (&$pObj)
 
 crawler_execute ($params, &$pObj)
 
 crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj)
 
 crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj)
 
 cleanUpOldRunningConfigurations ()
 
 checkUrl ($url, $urlLog, $baseUrl)
 
 indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId)
 
 indexSingleRecord ($r, $cfgRec, $rl=null)
 
 getUidRootLineForClosestTemplate ($id)
 
 generateNextIndexingTime ($cfgRec)
 
 checkDeniedSuburls ($url, $url_deny)
 
 addQueueEntryForHook ($cfgRec, $title)
 
 deleteFromIndex ($id)
 
 processCmdmap_preProcess ($command, $table, $id, $value, $pObj)
 
 processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, $pObj)
 

Public Attributes

 $secondsPerExternalUrl = 3
 
 $instanceCounter = 0
 
 $callBack = self::class
 

Detailed Description

Crawler hook for indexed search. Works with the "crawler" extension

Definition at line 24 of file CrawlerHook.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::__construct ( )

The constructor

Definition at line 48 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Member Function Documentation

◆ addQueueEntryForHook()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::addQueueEntryForHook (   $cfgRec,
  $title 
)

Adding entry in queue for Hook

Parameters
array$cfgRecConfiguration record
string$titleTitle/URL
Returns
void

Definition at line 640 of file CrawlerHook.php.

◆ checkDeniedSuburls()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::checkDeniedSuburls (   $url,
  $url_deny 
)

Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns TRUE.

Parameters
string$urlURL to test
string$url_denyString where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend)
Returns
bool TRUE if there is a matching URL (hence, do not index!)

Definition at line 620 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\isFirstPartOfStr(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ checkUrl()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::checkUrl (   $url,
  $urlLog,
  $baseUrl 
)

Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.

Parameters
string$urlURL string to check
array$urlLogArray of already indexed URLs (input url is looked up here and must not exist already)
string$baseUrlBase URL of the indexing process (input URL must be "inside" the base URL!)
Returns
string Returls the URL if OK, otherwise FALSE

Definition at line 467 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\isFirstPartOfStr().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ cleanUpOldRunningConfigurations()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::cleanUpOldRunningConfigurations ( )

Look up all old index configurations which are finished and needs to be reset and done

Returns
void

Definition at line 426 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_init().

◆ crawler_execute()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute (   $params,
$pObj 
)

Call back function for execution of a log element

Parameters
array$paramsParams from log element. Must contain $params['indexConfigUid']
object$pObjParent object (tx_crawler lib)
Returns
array Result array

Definition at line 173 of file CrawlerHook.php.

References $GLOBALS, $params, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type1(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type2(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type4(), and TYPO3\CMS\Core\Utility\GeneralUtility\getUserObj().

◆ crawler_execute_type1()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type1 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing records from a table

Parameters
array$cfgRecIndexing Configuration Record
array$session_dataSession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array$paramsParameters from the log queue.
object$pObjParent object (from "crawler" extension!)
Returns
void

Definition at line 233 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\Backend\Utility\BackendUtility\BEenableFields(), TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type2()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type2 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing files from fileadmin

Parameters
array$cfgRecIndexing Configuration Record
array$session_dataSession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array$paramsParameters from the log queue.
object$pObjParent object (from "crawler" extension!)
Returns
void

Definition at line 278 of file CrawlerHook.php.

References $GLOBALS, $params, TYPO3\CMS\Core\Utility\GeneralUtility\get_dirs(), TYPO3\CMS\Core\Utility\GeneralUtility\getAllFilesAndFoldersInPath(), TYPO3\CMS\Core\Utility\GeneralUtility\getFileAbsFileName(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), TYPO3\CMS\Core\Utility\GeneralUtility\isAbsPath(), TYPO3\CMS\Core\Utility\GeneralUtility\isAllowedAbsPath(), TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance(), TYPO3\CMS\Core\Utility\GeneralUtility\removePrefixPathFromList(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type3()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type3 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Indexing External URLs

Parameters
array$cfgRecIndexing Configuration Record
array$session_dataSession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array$paramsParameters from the log queue.
object$pObjParent object (from "crawler" extension!)
Returns
void

Definition at line 340 of file CrawlerHook.php.

References $GLOBALS, $params, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\checkDeniedSuburls(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\checkUrl(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\getUidRootLineForClosestTemplate(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexExtUrl().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_execute_type4()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_execute_type4 (   $cfgRec,
$session_data,
  $params,
$pObj 
)

Page tree indexing type

Parameters
array$cfgRecIndexing Configuration Record
array$session_dataSession data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array$paramsParameters from the log queue.
object$pObjParent object (from "crawler" extension!)
Returns
void

Definition at line 381 of file CrawlerHook.php.

References $GLOBALS, $params, TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), and TYPO3\CMS\Backend\Utility\BackendUtility\getRecord().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute().

◆ crawler_init()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::crawler_init ( $pObj)

Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.

Parameters
object$pObjParent object (tx_crawler lib)
Returns
void

Definition at line 65 of file CrawlerHook.php.

References $GLOBALS, $params, TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\cleanUpOldRunningConfigurations(), TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\generateNextIndexingTime(), TYPO3\CMS\Core\Utility\GeneralUtility\getUserObj(), and TYPO3\CMS\Core\Utility\GeneralUtility\md5int().

◆ deleteFromIndex()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::deleteFromIndex (   $id)

Deletes all data stored by indexed search for a given page

Parameters
int$idUid of the page to delete all pHash
Returns
void

Definition at line 657 of file CrawlerHook.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\processCmdmap_preProcess(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\processDatamap_afterDatabaseOperations().

◆ generateNextIndexingTime()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::generateNextIndexingTime (   $cfgRec)

Generate the unix time stamp for next visit.

Parameters
array$cfgRecIndex configuration record
Returns
int The next time stamp

Definition at line 594 of file CrawlerHook.php.

References $GLOBALS, and TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_init().

◆ getUidRootLineForClosestTemplate()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::getUidRootLineForClosestTemplate (   $id)

Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser

Parameters
int$idThe page id to traverse rootline back from
Returns
array Array where the root lines uid values are found.

Definition at line 569 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type1(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type2(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

◆ indexExtUrl()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::indexExtUrl (   $url,
  $pageId,
  $rl,
  $cfgUid,
  $setId 
)

Indexing External URL

Parameters
string$urlURL, http://....
int$pageIdPage id to relate indexing to.
array$rlRootline array to relate indexing to
int$cfgUidConfiguration UID
int$setIdSet ID value
Returns
array URLs found on this page

Definition at line 490 of file CrawlerHook.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\makeInstance(), and TYPO3\CMS\Core\Utility\GeneralUtility\resolveBackPath().

Referenced by TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\crawler_execute_type3().

◆ indexSingleRecord()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::indexSingleRecord (   $r,
  $cfgRec,
  $rl = null 
)

◆ processCmdmap_preProcess()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::processCmdmap_preProcess (   $command,
  $table,
  $id,
  $value,
  $pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters
string$commandTCEmain command
string$tableTable name
string$idRecord ID. If new record its a string pointing to index inside ::substNEWwithIDs
mixed$valueTarget value (ignored)
FormEngine$pObjtcemain calling object
Returns
void

Definition at line 696 of file CrawlerHook.php.

References TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\deleteFromIndex().

◆ processDatamap_afterDatabaseOperations()

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::processDatamap_afterDatabaseOperations (   $status,
  $table,
  $id,
  $fieldArray,
  $pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters
string$statusStatus "new" or "update
string$tableTable name
string$idRecord ID. If new record its a string pointing to index inside ::substNEWwithIDs
array$fieldArrayField array of updated fields in the operation
FormEngine$pObjtcemain calling object
Returns
void

Definition at line 714 of file CrawlerHook.php.

References $GLOBALS, TYPO3\CMS\Backend\Utility\BackendUtility\deleteClause(), TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\deleteFromIndex(), TYPO3\CMS\Backend\Utility\BackendUtility\getRecord(), and TYPO3\CMS\IndexedSearch\Hook\CrawlerHook\indexSingleRecord().

Member Data Documentation

◆ $callBack

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$callBack = self::class

Definition at line 43 of file CrawlerHook.php.

◆ $instanceCounter

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$instanceCounter = 0

Definition at line 38 of file CrawlerHook.php.

◆ $secondsPerExternalUrl

TYPO3\CMS\IndexedSearch\Hook\CrawlerHook::$secondsPerExternalUrl = 3

Definition at line 31 of file CrawlerHook.php.