‪TYPO3CMS  10.4
TYPO3\CMS\IndexedSearch\FileContentParser Class Reference

Public Member Functions

 __construct ()
 
bool initParser ($extension)
 
bool softInit ($extension)
 
string searchTypeMediaTitle ($extension)
 
bool isMultiplePageExtension ($extension)
 
array readFileContent ($ext, $absFile, $cPKey)
 
array fileContentParts ($ext, $absFile)
 
array splitPdfInfo ($pdfInfoArray)
 
string removeEndJunk ($string)
 
string getIcon ($extension)
 

Public Attributes

int $pdf_mode = -20
 
array $app = array( )
 
array $ext2itemtype_map = array( )
 
array $supportedExtensions = array( )
 
TYPO3 CMS IndexedSearch Indexer $pObj
 

Protected Member Functions

string sL ($reference)
 
 setLocaleForServerFileSystem ($resetLocale=false)
 

Protected Attributes

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController $langObject
 
string $lastLocale
 

Detailed Description

External standard parsers for indexed_search MUST RETURN utf-8 content!

Definition at line 29 of file FileContentParser.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\FileContentParser::__construct ( )

Constructs this external parsers object

Definition at line 66 of file FileContentParser.php.

References $GLOBALS.

Member Function Documentation

◆ fileContentParts()

array TYPO3\CMS\IndexedSearch\FileContentParser::fileContentParts (   $ext,
  $absFile 
)

Creates an array with pointers to divisions of document.

ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.

Parameters
string$ext‪File extension
string$absFile‪Absolute filename (must exist and be validated OK before calling function)
Returns
‪array Array of pointers to sections that the document should be divided into

Definition at line 742 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ getIcon()

string TYPO3\CMS\IndexedSearch\FileContentParser::getIcon (   $extension)

Return icon for file extension

Parameters
string$extension‪File extension, lowercase.
Returns
‪string Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()

Definition at line 819 of file FileContentParser.php.

◆ initParser()

bool TYPO3\CMS\IndexedSearch\FileContentParser::initParser (   $extension)

Initialize external parser for parsing content.

Parameters
string$extension‪File extension
Returns
‪bool Returns TRUE if extension is supported/enabled, otherwise FALSE.

Definition at line 78 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\Core\Core\Environment\isWindows(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ isMultiplePageExtension()

bool TYPO3\CMS\IndexedSearch\FileContentParser::isMultiplePageExtension (   $extension)

Returns TRUE if the input extension (item_type) is a potentially a multi-page extension

Parameters
string$extension‪Extension / item_type string
Returns
‪bool Return TRUE if multi-page

Definition at line 419 of file FileContentParser.php.

◆ readFileContent()

array TYPO3\CMS\IndexedSearch\FileContentParser::readFileContent (   $ext,
  $absFile,
  $cPKey 
)

Reads the content of an external file being indexed.

Parameters
string$ext‪File extension, eg. "pdf", "doc" etc.
string$absFile‪Absolute filename of file (must exist and be validated OK before calling function)
string$cPKey‪Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)
Returns
‪array Standard content array (title, description, keywords, body keys)

Definition at line 453 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\IndexedSearch\FileContentParser\removeEndJunk(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ removeEndJunk()

string TYPO3\CMS\IndexedSearch\FileContentParser::removeEndJunk (   $string)

Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.

Parameters
string$string‪String to clean up
Returns
‪string String

Definition at line 803 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ searchTypeMediaTitle()

string TYPO3\CMS\IndexedSearch\FileContentParser::searchTypeMediaTitle (   $extension)

Return title of entry in media type selector box.

Parameters
string$extension‪File extension
Returns
‪string String with label value of entry in media type search selector box (frontend plugin).

Definition at line 289 of file FileContentParser.php.

References TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ setLocaleForServerFileSystem()

TYPO3\CMS\IndexedSearch\FileContentParser::setLocaleForServerFileSystem (   $resetLocale = false)
protected

Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.

Parameter $resetLocale has to be FALSE and TRUE alternating for all calls.

@staticvar string $lastLocale Stores the locale used before it is overridden by this method.

Parameters
bool$resetLocale‪TRUE resets the locale to $lastLocale.
Exceptions

Definition at line 711 of file FileContentParser.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ sL()

string TYPO3\CMS\IndexedSearch\FileContentParser::sL (   $reference)
protected

Wraps the "splitLabel function" of the language object.

Parameters
string$reference‪Reference/key of the label
Returns
‪string The label of the reference/key to be fetched

Definition at line 435 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\initParser(), TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent(), and TYPO3\CMS\IndexedSearch\FileContentParser\searchTypeMediaTitle().

◆ softInit()

bool TYPO3\CMS\IndexedSearch\FileContentParser::softInit (   $extension)

Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin

Parameters
string$extension‪File extension to initialize for.
Returns
‪bool Returns TRUE if the extension is supported and enabled, otherwise FALSE.

Definition at line 247 of file FileContentParser.php.

◆ splitPdfInfo()

array TYPO3\CMS\IndexedSearch\FileContentParser::splitPdfInfo (   $pdfInfoArray)

Analysing PDF info into a usable format.

Parameters
array$pdfInfoArray‪Array of PDF content, coming from the pdfinfo tool
Returns
‪array Result array
See also
fileContentParts()

Definition at line 783 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

Member Data Documentation

◆ $app

array TYPO3\CMS\IndexedSearch\FileContentParser::$app = array( )

Definition at line 41 of file FileContentParser.php.

◆ $ext2itemtype_map

array TYPO3\CMS\IndexedSearch\FileContentParser::$ext2itemtype_map = array( )

Definition at line 45 of file FileContentParser.php.

◆ $langObject

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController TYPO3\CMS\IndexedSearch\FileContentParser::$langObject
protected

Definition at line 57 of file FileContentParser.php.

◆ $lastLocale

string TYPO3\CMS\IndexedSearch\FileContentParser::$lastLocale
protected

Backup for setLocaleForServerFileSystem()

Definition at line 61 of file FileContentParser.php.

◆ $pdf_mode

int TYPO3\CMS\IndexedSearch\FileContentParser::$pdf_mode = -20

This value is also overridden from config. zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10

Definition at line 37 of file FileContentParser.php.

◆ $pObj

TYPO3 CMS IndexedSearch Indexer TYPO3\CMS\IndexedSearch\FileContentParser::$pObj

Definition at line 53 of file FileContentParser.php.

◆ $supportedExtensions

array TYPO3\CMS\IndexedSearch\FileContentParser::$supportedExtensions = array( )

Definition at line 49 of file FileContentParser.php.