‪TYPO3CMS  ‪main
TYPO3\CMS\IndexedSearch\FileContentParser Class Reference

Public Member Functions

 __construct ()
bool initParser ($extension)
bool softInit ($extension)
string false searchTypeMediaTitle ($extension)
bool isMultiplePageExtension ($extension)
array false null readFileContent ($ext, $absFile, $cPKey)
array fileContentParts ($ext, $absFile)
array splitPdfInfo ($pdfInfoArray)
string removeEndJunk ($string)
string getIcon ($extension)

Public Attributes

int $pdf_mode = -20
array $app = array( )
array $ext2itemtype_map = array( )
array $supportedExtensions = array( )
TYPO3 CMS IndexedSearch Indexer $pObj

Protected Member Functions

string sL ($reference)
 setLocaleForServerFileSystem ($resetLocale=false)

Protected Attributes

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController $langObject
string null $lastLocale

Detailed Description

External standard parsers for indexed_search MUST RETURN utf-8 content!

will be removed, in favor of unified Content Extractor API.

Definition at line 32 of file FileContentParser.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\FileContentParser::__construct ( )

Constructs this external parsers object

Definition at line 69 of file FileContentParser.php.

References $GLOBALS, and TYPO3\CMS\Core\Http\fromRequest.

Member Function Documentation

◆ fileContentParts()

array TYPO3\CMS\IndexedSearch\FileContentParser::fileContentParts (   $ext,

Creates an array with pointers to divisions of document.

ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.

string$ext‪File extension
string$absFile‪Absolute filename (must exist and be validated OK before calling function)
‪array Array of pointers to sections that the document should be divided into

Definition at line 746 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ getIcon()

string TYPO3\CMS\IndexedSearch\FileContentParser::getIcon (   $extension)

Return icon for file extension

string$extension‪File extension, lowercase.
‪string Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()

Definition at line 823 of file FileContentParser.php.

◆ initParser()

bool TYPO3\CMS\IndexedSearch\FileContentParser::initParser (   $extension)

Initialize external parser for parsing content.

string$extension‪File extension
‪bool Returns TRUE if extension is supported/enabled, otherwise FALSE.

Definition at line 81 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\Core\Core\Environment\isWindows(), and TYPO3\CMS\IndexedSearch\FileContentParser\sL().

◆ isMultiplePageExtension()

bool TYPO3\CMS\IndexedSearch\FileContentParser::isMultiplePageExtension (   $extension)

Returns TRUE if the input extension (item_type) is a potentially a multi-page extension

string$extension‪Extension / item_type string
‪bool Return TRUE if multi-page

Definition at line 422 of file FileContentParser.php.

◆ readFileContent()

array false null TYPO3\CMS\IndexedSearch\FileContentParser::readFileContent (   $ext,

Reads the content of an external file being indexed.

string$ext‪File extension, eg. "pdf", "doc" etc.
string$absFile‪Absolute filename of file (must exist and be validated OK before calling function)
string$cPKey‪Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)
‪array|false|null Standard content array (title, description, keywords, body keys), false if the extension is not supported or null if nothing found

Definition at line 456 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\IndexedSearch\FileContentParser\removeEndJunk(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ removeEndJunk()

string TYPO3\CMS\IndexedSearch\FileContentParser::removeEndJunk (   $string)

Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.

string$string‪String to clean up
‪string String

Definition at line 807 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ searchTypeMediaTitle()

string false TYPO3\CMS\IndexedSearch\FileContentParser::searchTypeMediaTitle (   $extension)

Return title of entry in media type selector box.

string$extension‪File extension
‪string|false String with label value of entry in media type search selector box (frontend plugin).

Definition at line 292 of file FileContentParser.php.

References TYPO3\CMS\IndexedSearch\FileContentParser\sL().

◆ setLocaleForServerFileSystem()

TYPO3\CMS\IndexedSearch\FileContentParser::setLocaleForServerFileSystem (   $resetLocale = false)

Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.

Parameter $resetLocale has to be FALSE and TRUE alternating for all calls.

@staticvar string $lastLocale Stores the locale used before it is overridden by this method.

bool$resetLocale‪TRUE resets the locale to $lastLocale.

Definition at line 713 of file FileContentParser.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ sL()

string TYPO3\CMS\IndexedSearch\FileContentParser::sL (   $reference)

Wraps the "splitLabel function" of the language object.

string$reference‪Reference/key of the label
‪string The label of the reference/key to be fetched

Definition at line 438 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\initParser(), TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent(), and TYPO3\CMS\IndexedSearch\FileContentParser\searchTypeMediaTitle().

◆ softInit()

bool TYPO3\CMS\IndexedSearch\FileContentParser::softInit (   $extension)

Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin

string$extension‪File extension to initialize for.
‪bool Returns TRUE if the extension is supported and enabled, otherwise FALSE.

Definition at line 250 of file FileContentParser.php.

◆ splitPdfInfo()

array TYPO3\CMS\IndexedSearch\FileContentParser::splitPdfInfo (   $pdfInfoArray)

Analysing PDF info into a usable format.

array$pdfInfoArray‪Array of PDF content, coming from the pdfinfo tool
‪array Result array
See also

Definition at line 787 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

Member Data Documentation

◆ $app

array TYPO3\CMS\IndexedSearch\FileContentParser::$app = array( )

Definition at line 44 of file FileContentParser.php.

◆ $ext2itemtype_map

array TYPO3\CMS\IndexedSearch\FileContentParser::$ext2itemtype_map = array( )

Definition at line 48 of file FileContentParser.php.

◆ $langObject

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController TYPO3\CMS\IndexedSearch\FileContentParser::$langObject

Definition at line 60 of file FileContentParser.php.

◆ $lastLocale

string null TYPO3\CMS\IndexedSearch\FileContentParser::$lastLocale

Backup for setLocaleForServerFileSystem()

Definition at line 64 of file FileContentParser.php.

◆ $pdf_mode

int TYPO3\CMS\IndexedSearch\FileContentParser::$pdf_mode = -20

This value is also overridden from config. zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10

Definition at line 40 of file FileContentParser.php.

◆ $pObj

TYPO3 CMS IndexedSearch Indexer TYPO3\CMS\IndexedSearch\FileContentParser::$pObj

Definition at line 56 of file FileContentParser.php.

◆ $supportedExtensions

array TYPO3\CMS\IndexedSearch\FileContentParser::$supportedExtensions = array( )

Definition at line 52 of file FileContentParser.php.