FileContentParser
External standard parsers for indexed_search MUST RETURN utf-8 content!
will be removed, in favor of unified Content Extractor API.
Table of Contents
Properties
- $app : array<string|int, mixed>
- $ext2itemtype_map : array<string|int, mixed>
- $pdf_mode : int
- This value is also overridden from config.
- $pObj : Indexer
- $supportedExtensions : array<string|int, mixed>
- $langObject : LanguageService|TypoScriptFrontendController
- $lastLocale : string|null
Methods
- __construct() : mixed
- Constructs this external parsers object
- fileContentParts() : array<string|int, mixed>
- Creates an array with pointers to divisions of document.
- getIcon() : string
- Return icon for file extension
- initParser() : bool
- Initialize external parser for parsing content.
- isMultiplePageExtension() : bool
- Returns TRUE if the input extension (item_type) is a potentially a multi-page extension
- readFileContent() : IndexingDataAsString|false|null
- Reads the content of an external file being indexed.
- removeEndJunk() : string
- Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.
- searchTypeMediaTitle() : string|false
- Return title of entry in media type selector box.
- softInit() : bool
- Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin
- splitPdfInfo() : array<string|int, mixed>
- Analysing PDF info into a usable format.
- setLocaleForServerFileSystem() : void
- Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.
- sL() : string
- Wraps the "splitLabel function" of the language object.
Properties
$app
public
array<string|int, mixed>
$app
= []
$ext2itemtype_map
public
array<string|int, mixed>
$ext2itemtype_map
= []
$pdf_mode
This value is also overridden from config.
public
int
$pdf_mode
= -20
zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10
$pObj
public
Indexer
$pObj
$supportedExtensions
public
array<string|int, mixed>
$supportedExtensions
= []
$langObject
protected
LanguageService|TypoScriptFrontendController
$langObject
$lastLocale
protected
string|null
$lastLocale
= null
Methods
__construct()
Constructs this external parsers object
public
__construct() : mixed
fileContentParts()
Creates an array with pointers to divisions of document.
public
fileContentParts(string $ext, string $absFile) : array<string|int, mixed>
ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.
Parameters
- $ext : string
-
File extension
- $absFile : string
-
Absolute filename (must exist and be validated OK before calling function)
Return values
array<string|int, mixed> —Array of pointers to sections that the document should be divided into
getIcon()
Return icon for file extension
public
getIcon(string $extension) : string
Parameters
- $extension : string
-
File extension, lowercase.
Return values
string —Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()
initParser()
Initialize external parser for parsing content.
public
initParser(string $extension) : bool
Parameters
- $extension : string
-
File extension
Return values
bool —Returns TRUE if extension is supported/enabled, otherwise FALSE.
isMultiplePageExtension()
Returns TRUE if the input extension (item_type) is a potentially a multi-page extension
public
isMultiplePageExtension(string $extension) : bool
Parameters
- $extension : string
-
Extension / item_type string
Return values
bool —Return TRUE if multi-page
readFileContent()
Reads the content of an external file being indexed.
public
readFileContent(string $ext, string $absFile, string|int $cPKey) : IndexingDataAsString|false|null
Parameters
- $ext : string
-
File extension, eg. "pdf", "doc" etc.
- $absFile : string
-
Absolute filename of file (must exist and be validated OK before calling function)
- $cPKey : string|int
-
Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)
Return values
IndexingDataAsString|false|null —Indexing DTO, false if the extension is not supported or null if nothing found
removeEndJunk()
Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.
public
removeEndJunk(string $string) : string
Parameters
- $string : string
-
String to clean up
Return values
string —String
searchTypeMediaTitle()
Return title of entry in media type selector box.
public
searchTypeMediaTitle(string $extension) : string|false
Parameters
- $extension : string
-
File extension
Return values
string|false —String with label value of entry in media type search selector box (frontend plugin).
softInit()
Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin
public
softInit(string $extension) : bool
Parameters
- $extension : string
-
File extension to initialize for.
Return values
bool —Returns TRUE if the extension is supported and enabled, otherwise FALSE.
splitPdfInfo()
Analysing PDF info into a usable format.
public
splitPdfInfo(array<string|int, mixed> $pdfInfoArray) : array<string|int, mixed>
Parameters
- $pdfInfoArray : array<string|int, mixed>
-
Array of PDF content, coming from the pdfinfo tool
Tags
Return values
array<string|int, mixed> —Result array
setLocaleForServerFileSystem()
Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.
protected
setLocaleForServerFileSystem([bool $resetLocale = false ]) : void
Parameter $resetLocale
has to be FALSE and TRUE alternating for all calls.
Parameters
- $resetLocale : bool = false
-
TRUE resets the locale to $lastLocale.
Tags
sL()
Wraps the "splitLabel function" of the language object.
protected
sL(string $reference) : string
Parameters
- $reference : string
-
Reference/key of the label
Return values
string —The label of the reference/key to be fetched