‪TYPO3CMS  ‪main
TYPO3\CMS\IndexedSearch\FileContentParser Class Reference

Public Member Functions

 __construct ()
 
bool initParser (string $extension)
 
bool softInit (string $extension)
 
string false searchTypeMediaTitle (string $extension)
 
bool isMultiplePageExtension (string $extension)
 
IndexingDataAsString false null readFileContent (string $ext, string $absFile, string|int $cPKey)
 
array fileContentParts (string $ext, string $absFile)
 
array splitPdfInfo (array $pdfInfoArray)
 
string removeEndJunk (string $string)
 
string getIcon (string $extension)
 

Public Attributes

int $pdf_mode = -20
 
array $app = []
 
array $ext2itemtype_map = []
 
array $supportedExtensions = []
 
Indexer $pObj
 

Protected Member Functions

string sL (string $reference)
 
 setLocaleForServerFileSystem (bool $resetLocale=false)
 

Protected Attributes

LanguageService TypoScriptFrontendController $langObject
 
string $lastLocale = null
 

Detailed Description

External standard parsers for indexed_search MUST RETURN utf-8 content!

will be removed, in favor of unified Content Extractor API.

Definition at line 35 of file FileContentParser.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\FileContentParser::__construct ( )

Constructs this external parsers object

Definition at line 53 of file FileContentParser.php.

References $GLOBALS, and TYPO3\CMS\Core\Http\fromRequest.

Member Function Documentation

◆ fileContentParts()

array TYPO3\CMS\IndexedSearch\FileContentParser::fileContentParts ( string  $ext,
string  $absFile 
)

Creates an array with pointers to divisions of document.

ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.

Parameters
string$ext‪File extension
string$absFile‪Absolute filename (must exist and be validated OK before calling function)
Returns
‪array Array of pointers to sections that the document should be divided into

Definition at line 725 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ getIcon()

string TYPO3\CMS\IndexedSearch\FileContentParser::getIcon ( string  $extension)

Return icon for file extension

Parameters
string$extension‪File extension, lowercase.
Returns
‪string Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()

Definition at line 800 of file FileContentParser.php.

◆ initParser()

bool TYPO3\CMS\IndexedSearch\FileContentParser::initParser ( string  $extension)

Initialize external parser for parsing content.

Parameters
string$extension‪File extension
Returns
‪bool Returns TRUE if extension is supported/enabled, otherwise FALSE.

Definition at line 65 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\Core\Core\Environment\isWindows(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ isMultiplePageExtension()

bool TYPO3\CMS\IndexedSearch\FileContentParser::isMultiplePageExtension ( string  $extension)

Returns TRUE if the input extension (item_type) is a potentially a multi-page extension

Parameters
string$extension‪Extension / item_type string
Returns
‪bool Return TRUE if multi-page

Definition at line 406 of file FileContentParser.php.

◆ readFileContent()

IndexingDataAsString false null TYPO3\CMS\IndexedSearch\FileContentParser::readFileContent ( string  $ext,
string  $absFile,
string|int  $cPKey 
)

Reads the content of an external file being indexed.

Parameters
string$ext‪File extension, eg. "pdf", "doc" etc.
string$absFile‪Absolute filename of file (must exist and be validated OK before calling function)
string | int$cPKey‪Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)
Returns
‪IndexingDataAsString|false|null Indexing DTO, false if the extension is not supported or null if nothing found

Definition at line 435 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\IndexedSearch\FileContentParser\removeEndJunk(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ removeEndJunk()

string TYPO3\CMS\IndexedSearch\FileContentParser::removeEndJunk ( string  $string)

Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.

Parameters
string$string‪String to clean up
Returns
‪string String

Definition at line 784 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ searchTypeMediaTitle()

string false TYPO3\CMS\IndexedSearch\FileContentParser::searchTypeMediaTitle ( string  $extension)

Return title of entry in media type selector box.

Parameters
string$extension‪File extension
Returns
‪string|false String with label value of entry in media type search selector box (frontend plugin).

Definition at line 276 of file FileContentParser.php.

References TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ setLocaleForServerFileSystem()

TYPO3\CMS\IndexedSearch\FileContentParser::setLocaleForServerFileSystem ( bool  $resetLocale = false)
protected

Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.

Parameter $resetLocale has to be FALSE and TRUE alternating for all calls.

@staticvar string $lastLocale Stores the locale used before it is overridden by this method.

Parameters
bool$resetLocale‪TRUE resets the locale to $lastLocale.
Exceptions

Definition at line 692 of file FileContentParser.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ sL()

string TYPO3\CMS\IndexedSearch\FileContentParser::sL ( string  $reference)
protected

Wraps the "splitLabel function" of the language object.

Parameters
string$reference‪Reference/key of the label
Returns
‪string The label of the reference/key to be fetched

Definition at line 417 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\initParser(), TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent(), and TYPO3\CMS\IndexedSearch\FileContentParser\searchTypeMediaTitle().

◆ softInit()

bool TYPO3\CMS\IndexedSearch\FileContentParser::softInit ( string  $extension)

Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin

Parameters
string$extension‪File extension to initialize for.
Returns
‪bool Returns TRUE if the extension is supported and enabled, otherwise FALSE.

Definition at line 234 of file FileContentParser.php.

◆ splitPdfInfo()

array TYPO3\CMS\IndexedSearch\FileContentParser::splitPdfInfo ( array  $pdfInfoArray)

Analysing PDF info into a usable format.

Parameters
array$pdfInfoArray‪Array of PDF content, coming from the pdfinfo tool
Returns
‪array Result array
See also
fileContentParts()

Definition at line 766 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

Member Data Documentation

◆ $app

array TYPO3\CMS\IndexedSearch\FileContentParser::$app = []

Definition at line 43 of file FileContentParser.php.

◆ $ext2itemtype_map

array TYPO3\CMS\IndexedSearch\FileContentParser::$ext2itemtype_map = []

Definition at line 44 of file FileContentParser.php.

◆ $langObject

LanguageService TypoScriptFrontendController TYPO3\CMS\IndexedSearch\FileContentParser::$langObject
protected

Definition at line 47 of file FileContentParser.php.

◆ $lastLocale

string TYPO3\CMS\IndexedSearch\FileContentParser::$lastLocale = null
protected

Definition at line 48 of file FileContentParser.php.

◆ $pdf_mode

int TYPO3\CMS\IndexedSearch\FileContentParser::$pdf_mode = -20

This value is also overridden from config. zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10

Definition at line 42 of file FileContentParser.php.

◆ $pObj

Indexer TYPO3\CMS\IndexedSearch\FileContentParser::$pObj

Definition at line 46 of file FileContentParser.php.

◆ $supportedExtensions

array TYPO3\CMS\IndexedSearch\FileContentParser::$supportedExtensions = []

Definition at line 45 of file FileContentParser.php.