‪TYPO3CMS  11.5
TYPO3\CMS\IndexedSearch\FileContentParser Class Reference

Public Member Functions

 __construct ()
 
bool initParser ($extension)
 
bool softInit ($extension)
 
string false searchTypeMediaTitle ($extension)
 
bool isMultiplePageExtension ($extension)
 
array false null readFileContent ($ext, $absFile, $cPKey)
 
array fileContentParts ($ext, $absFile)
 
array splitPdfInfo ($pdfInfoArray)
 
string removeEndJunk ($string)
 
string getIcon ($extension)
 

Public Attributes

int $pdf_mode = -20
 
array $app = array( )
 
array $ext2itemtype_map = array( )
 
array $supportedExtensions = array( )
 
TYPO3 CMS IndexedSearch Indexer $pObj
 

Protected Member Functions

string sL ($reference)
 
 setLocaleForServerFileSystem ($resetLocale=false)
 

Protected Attributes

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController $langObject
 
string null $lastLocale
 

Detailed Description

External standard parsers for indexed_search MUST RETURN utf-8 content!

will be removed, in favor of unified Content Extractor API.

Definition at line 32 of file FileContentParser.php.

Constructor & Destructor Documentation

◆ __construct()

TYPO3\CMS\IndexedSearch\FileContentParser::__construct ( )

Constructs this external parsers object

Definition at line 69 of file FileContentParser.php.

References $GLOBALS, and TYPO3\CMS\Core\Http\ApplicationType\fromRequest().

Member Function Documentation

◆ fileContentParts()

array TYPO3\CMS\IndexedSearch\FileContentParser::fileContentParts (   $ext,
  $absFile 
)

Creates an array with pointers to divisions of document.

ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.

Parameters
string$ext‪File extension
string$absFile‪Absolute filename (must exist and be validated OK before calling function)
Returns
‪array Array of pointers to sections that the document should be divided into

Definition at line 744 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ getIcon()

string TYPO3\CMS\IndexedSearch\FileContentParser::getIcon (   $extension)

Return icon for file extension

Parameters
string$extension‪File extension, lowercase.
Returns
‪string Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()

Definition at line 821 of file FileContentParser.php.

◆ initParser()

bool TYPO3\CMS\IndexedSearch\FileContentParser::initParser (   $extension)

Initialize external parser for parsing content.

Parameters
string$extension‪File extension
Returns
‪bool Returns TRUE if extension is supported/enabled, otherwise FALSE.

Definition at line 81 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\MathUtility\forceIntegerInRange(), TYPO3\CMS\Core\Core\Environment\isWindows(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ isMultiplePageExtension()

bool TYPO3\CMS\IndexedSearch\FileContentParser::isMultiplePageExtension (   $extension)

Returns TRUE if the input extension (item_type) is a potentially a multi-page extension

Parameters
string$extension‪Extension / item_type string
Returns
‪bool Return TRUE if multi-page

Definition at line 422 of file FileContentParser.php.

◆ readFileContent()

array false null TYPO3\CMS\IndexedSearch\FileContentParser::readFileContent (   $ext,
  $absFile,
  $cPKey 
)

Reads the content of an external file being indexed.

Parameters
string$ext‪File extension, eg. "pdf", "doc" etc.
string$absFile‪Absolute filename of file (must exist and be validated OK before calling function)
string$cPKey‪Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)
Returns
‪array|false|null Standard content array (title, description, keywords, body keys), false if the extension is not supported or null if nothing found

Definition at line 456 of file FileContentParser.php.

References TYPO3\CMS\Core\Utility\PathUtility\basename(), TYPO3\CMS\Core\Utility\CommandUtility\exec(), TYPO3\CMS\Core\Utility\GeneralUtility\getUrl(), TYPO3\CMS\IndexedSearch\FileContentParser\removeEndJunk(), TYPO3\CMS\IndexedSearch\FileContentParser\setLocaleForServerFileSystem(), TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\IndexedSearch\FileContentParser\splitPdfInfo().

◆ removeEndJunk()

string TYPO3\CMS\IndexedSearch\FileContentParser::removeEndJunk (   $string)

Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.

Parameters
string$string‪String to clean up
Returns
‪string String

Definition at line 805 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ searchTypeMediaTitle()

string false TYPO3\CMS\IndexedSearch\FileContentParser::searchTypeMediaTitle (   $extension)

Return title of entry in media type selector box.

Parameters
string$extension‪File extension
Returns
‪string|false String with label value of entry in media type search selector box (frontend plugin).

Definition at line 292 of file FileContentParser.php.

References TYPO3\CMS\IndexedSearch\FileContentParser\sL(), and TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

◆ setLocaleForServerFileSystem()

TYPO3\CMS\IndexedSearch\FileContentParser::setLocaleForServerFileSystem (   $resetLocale = false)
protected

Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.

Parameter $resetLocale has to be FALSE and TRUE alternating for all calls.

@staticvar string $lastLocale Stores the locale used before it is overridden by this method.

Parameters
bool$resetLocale‪TRUE resets the locale to $lastLocale.
Exceptions

Definition at line 713 of file FileContentParser.php.

References $GLOBALS.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

◆ sL()

string TYPO3\CMS\IndexedSearch\FileContentParser::sL (   $reference)
protected

Wraps the "splitLabel function" of the language object.

Parameters
string$reference‪Reference/key of the label
Returns
‪string The label of the reference/key to be fetched

Definition at line 438 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\initParser(), TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent(), and TYPO3\CMS\IndexedSearch\FileContentParser\searchTypeMediaTitle().

◆ softInit()

bool TYPO3\CMS\IndexedSearch\FileContentParser::softInit (   $extension)

Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin

Parameters
string$extension‪File extension to initialize for.
Returns
‪bool Returns TRUE if the extension is supported and enabled, otherwise FALSE.

Definition at line 250 of file FileContentParser.php.

◆ splitPdfInfo()

array TYPO3\CMS\IndexedSearch\FileContentParser::splitPdfInfo (   $pdfInfoArray)

Analysing PDF info into a usable format.

Parameters
array$pdfInfoArray‪Array of PDF content, coming from the pdfinfo tool
Returns
‪array Result array
See also
fileContentParts()

Definition at line 785 of file FileContentParser.php.

Referenced by TYPO3\CMS\IndexedSearch\FileContentParser\fileContentParts(), and TYPO3\CMS\IndexedSearch\FileContentParser\readFileContent().

Member Data Documentation

◆ $app

array TYPO3\CMS\IndexedSearch\FileContentParser::$app = array( )

Definition at line 44 of file FileContentParser.php.

◆ $ext2itemtype_map

array TYPO3\CMS\IndexedSearch\FileContentParser::$ext2itemtype_map = array( )

Definition at line 48 of file FileContentParser.php.

◆ $langObject

TYPO3 CMS Core Localization LanguageService TYPO3 CMS Frontend Controller TypoScriptFrontendController TYPO3\CMS\IndexedSearch\FileContentParser::$langObject
protected

Definition at line 60 of file FileContentParser.php.

◆ $lastLocale

string null TYPO3\CMS\IndexedSearch\FileContentParser::$lastLocale
protected

Backup for setLocaleForServerFileSystem()

Definition at line 64 of file FileContentParser.php.

◆ $pdf_mode

int TYPO3\CMS\IndexedSearch\FileContentParser::$pdf_mode = -20

This value is also overridden from config. zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10

Definition at line 40 of file FileContentParser.php.

◆ $pObj

TYPO3 CMS IndexedSearch Indexer TYPO3\CMS\IndexedSearch\FileContentParser::$pObj

Definition at line 56 of file FileContentParser.php.

◆ $supportedExtensions

array TYPO3\CMS\IndexedSearch\FileContentParser::$supportedExtensions = array( )

Definition at line 52 of file FileContentParser.php.