FileContentParser

External standard parsers for indexed_search MUST RETURN utf-8 content!

Internal

will be removed, in favor of unified Content Extractor API.

Table of Contents

Properties

$app  : array<string|int, mixed>
$ext2itemtype_map  : array<string|int, mixed>
$pdf_mode  : int
This value is also overridden from config.
$pObj  : Indexer
$supportedExtensions  : array<string|int, mixed>
$langObject  : LanguageService|TypoScriptFrontendController
$lastLocale  : string|null

Methods

__construct()  : mixed
Constructs this external parsers object
fileContentParts()  : array<string|int, mixed>
Creates an array with pointers to divisions of document.
getIcon()  : string
Return icon for file extension
initParser()  : bool
Initialize external parser for parsing content.
isMultiplePageExtension()  : bool
Returns TRUE if the input extension (item_type) is a potentially a multi-page extension
readFileContent()  : IndexingDataAsString|false|null
Reads the content of an external file being indexed.
removeEndJunk()  : string
Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.
searchTypeMediaTitle()  : string|false
Return title of entry in media type selector box.
softInit()  : bool
Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin
splitPdfInfo()  : array<string|int, mixed>
Analysing PDF info into a usable format.
setLocaleForServerFileSystem()  : void
Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.
sL()  : string
Wraps the "splitLabel function" of the language object.

Properties

$app

public array<string|int, mixed> $app = []

$ext2itemtype_map

public array<string|int, mixed> $ext2itemtype_map = []

$pdf_mode

This value is also overridden from config.

public int $pdf_mode = -20

zero: whole PDF file is indexed in one. positive value: Indicates number of pages at a time, eg. "5" would means 1-5,6-10,.... Negative integer would indicate (abs value) number of groups. Eg "3" groups of 10 pages would be 1-4,5-8,9-10

$supportedExtensions

public array<string|int, mixed> $supportedExtensions = []

$lastLocale

protected string|null $lastLocale = null

Methods

__construct()

Constructs this external parsers object

public __construct() : mixed

fileContentParts()

Creates an array with pointers to divisions of document.

public fileContentParts(string $ext, string $absFile) : array<string|int, mixed>

ONLY for PDF files at this point. All other types will have an array with a single element with the value "0" (zero) coming back.

Parameters
$ext : string

File extension

$absFile : string

Absolute filename (must exist and be validated OK before calling function)

Return values
array<string|int, mixed>

Array of pointers to sections that the document should be divided into

getIcon()

Return icon for file extension

public getIcon(string $extension) : string
Parameters
$extension : string

File extension, lowercase.

Return values
string

Relative file reference, resolvable by GeneralUtility::getFileAbsFileName()

initParser()

Initialize external parser for parsing content.

public initParser(string $extension) : bool
Parameters
$extension : string

File extension

Return values
bool

Returns TRUE if extension is supported/enabled, otherwise FALSE.

isMultiplePageExtension()

Returns TRUE if the input extension (item_type) is a potentially a multi-page extension

public isMultiplePageExtension(string $extension) : bool
Parameters
$extension : string

Extension / item_type string

Return values
bool

Return TRUE if multi-page

readFileContent()

Reads the content of an external file being indexed.

public readFileContent(string $ext, string $absFile, string|int $cPKey) : IndexingDataAsString|false|null
Parameters
$ext : string

File extension, eg. "pdf", "doc" etc.

$absFile : string

Absolute filename of file (must exist and be validated OK before calling function)

$cPKey : string|int

Pointer to section (zero for all other than PDF which will have an indication of pages into which the document should be split.)

Return values
IndexingDataAsString|false|null

Indexing DTO, false if the extension is not supported or null if nothing found

removeEndJunk()

Removes some strange char(12) characters and line breaks that then to occur in the end of the string from external files.

public removeEndJunk(string $string) : string
Parameters
$string : string

String to clean up

Return values
string

String

searchTypeMediaTitle()

Return title of entry in media type selector box.

public searchTypeMediaTitle(string $extension) : string|false
Parameters
$extension : string

File extension

Return values
string|false

String with label value of entry in media type search selector box (frontend plugin).

softInit()

Initialize external parser for backend modules Doesn't evaluate if parser is configured right - more like returning POSSIBLE supported extensions (for showing icons etc) in backend and frontend plugin

public softInit(string $extension) : bool
Parameters
$extension : string

File extension to initialize for.

Return values
bool

Returns TRUE if the extension is supported and enabled, otherwise FALSE.

splitPdfInfo()

Analysing PDF info into a usable format.

public splitPdfInfo(array<string|int, mixed> $pdfInfoArray) : array<string|int, mixed>
Parameters
$pdfInfoArray : array<string|int, mixed>

Array of PDF content, coming from the pdfinfo tool

Internal
Tags
see
fileContentParts()
Return values
array<string|int, mixed>

Result array

setLocaleForServerFileSystem()

Sets the locale for LC_CTYPE to $TYPO3_CONF_VARS['SYS']['systemLocale'] if $TYPO3_CONF_VARS['SYS']['UTF8filesystem'] is set.

protected setLocaleForServerFileSystem([bool $resetLocale = false ]) : void

Parameter $resetLocale has to be FALSE and TRUE alternating for all calls.

Parameters
$resetLocale : bool = false

TRUE resets the locale to $lastLocale.

Tags
staticvar

string $lastLocale Stores the locale used before it is overridden by this method.

throws
RuntimeException

sL()

Wraps the "splitLabel function" of the language object.

protected sL(string $reference) : string
Parameters
$reference : string

Reference/key of the label

Return values
string

The label of the reference/key to be fetched


        
On this page

Search results