‪TYPO3CMS  ‪main
TYPO3\CMS\Core\Html\HtmlParser Class Reference
Inheritance diagram for TYPO3\CMS\Core\Html\HtmlParser:
TYPO3\CMS\Core\Html\RteHtmlParser

Public Member Functions

array splitIntoBlock ($tag, $content, $eliminateExtraEndTags=false)
 
string splitIntoBlockRecursiveProc ($tag, $content, &$procObj, $callBackContent, $callBackTags, $level=0)
 
array splitTags ($tag, $content)
 
string removeFirstAndLastTag ($str)
 
string getFirstTag ($str)
 
string getFirstTagName ($str, $preserveCase=false)
 
array get_tag_attributes ($tag, $deHSC=false)
 
array split_tag_attributes ($tag)
 
string HTMLcleaner ($content, $tags=[], $keepAll=0, $hSC=0, $addConfig=[])
 
string bidir_htmlspecialchars ($value, $dir)
 
string prefixResourcePath ($main_prefix, $content, $alternatives=[], $suffix='')
 
string prefixRelPath ($prefix, $srcVal, $suffix='')
 
array string caseShift ($str, $caseSensitiveComparison, $cacheKey='')
 
string compileTagAttribs ($tagAttrib, $meta=[])
 
array HTMLparserConfig ($TSconfig, $keepTags=[])
 
string stripEmptyTags ($content, $tagList='', $treatNonBreakingSpaceAsEmpty=false, $keepTags=false)
 

Public Attributes

const VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr'
 

Protected Member Functions

 stripEmptyTagsIfConfigured (string $value, array $configuration)
 

Protected Attributes

array $caseShift_cache = []
 

Detailed Description

Functions for parsing HTML. You are encouraged to use this class in your own applications

Definition at line 25 of file HtmlParser.php.

Member Function Documentation

◆ bidir_htmlspecialchars()

string TYPO3\CMS\Core\Html\HtmlParser::bidir_htmlspecialchars (   $value,
  $dir 
)

Converts htmlspecialchars forth ($dir=1) AND back ($dir=-1)

Parameters
string$value‪Input value
int$dir‪Direction: forth ($dir=1, dir=2 for preserving entities) AND back ($dir=-1)
Returns
‪string Output value

Definition at line 689 of file HtmlParser.php.

References $dir.

◆ caseShift()

array string TYPO3\CMS\Core\Html\HtmlParser::caseShift (   $str,
  $caseSensitiveComparison,
  $cacheKey = '' 
)

Internal function for case shifting of a string or whole array

Parameters
mixed$str‪Input string/array
bool$caseSensitiveComparison‪If this value is FALSE, the string is returned in uppercase
string$cacheKey‪Key string used for internal caching of the results. Could be an MD5 hash of the serialized version of the input $str if that is an array.
Returns
‪array|string Output string, processed

Definition at line 828 of file HtmlParser.php.

◆ compileTagAttribs()

string TYPO3\CMS\Core\Html\HtmlParser::compileTagAttribs (   $tagAttrib,
  $meta = [] 
)

Compiling an array with tag attributes into a string

Parameters
array$tagAttrib‪Tag attributes
array$meta‪Meta information about these attributes (like if they were quoted)
Returns
‪string Imploded attributes, eg: 'attribute="value" attrib2="value2"'

Definition at line 857 of file HtmlParser.php.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\processContentWithinParagraph().

◆ get_tag_attributes()

array TYPO3\CMS\Core\Html\HtmlParser::get_tag_attributes (   $tag,
  $deHSC = false 
)

Returns an array with all attributes as keys. Attributes are only lowercase a-z If an attribute is empty (shorthand), then the value for the key is empty. You can check if it existed with isset()

Compared to the method in GeneralUtility::get_tag_attributes this method also returns meta data about each attribute, e.g. if it is a shorthand attribute, and what the quotation is. Also, since all attribute keys are lower-cased, the meta information contains the original attribute name.

Parameters
string$tag‪Tag: $tag is either a whole tag (eg '<TAG OPTION ATTRIB=VALUE>>') or the parameterlist (ex ' OPTION ATTRIB=VALUE>')
bool$deHSC‪If set, the attribute values are de-htmlspecialchar'ed. Should actually always be set!
Returns
‪array array(Tag attributes,Attribute meta-data)

Definition at line 269 of file HtmlParser.php.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\markBrokenLinks(), TYPO3\CMS\Core\Html\RteHtmlParser\processContentWithinParagraph(), TYPO3\CMS\Core\Html\RteHtmlParser\removeBrokenLinkMarkers(), and TYPO3\CMS\Core\Html\RteHtmlParser\TS_links_db().

◆ getFirstTag()

string TYPO3\CMS\Core\Html\HtmlParser::getFirstTag (   $str)

Returns the first tag in $str Actually everything from the beginning of the $str is returned, so you better make sure the tag is the first thing...

Parameters
string$str‪HTML string with tags
Returns
‪string

Definition at line 220 of file HtmlParser.php.

References $parser.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\markBrokenLinks(), TYPO3\CMS\Core\Html\RteHtmlParser\processContentWithinParagraph(), TYPO3\CMS\Core\Html\RteHtmlParser\removeBrokenLinkMarkers(), TYPO3\CMS\Core\Html\HtmlParser\splitIntoBlockRecursiveProc(), TYPO3\CMS\Core\Html\RteHtmlParser\TS_links_db(), TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_db(), and TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_rte().

◆ getFirstTagName()

string TYPO3\CMS\Core\Html\HtmlParser::getFirstTagName (   $str,
  $preserveCase = false 
)

Returns the NAME of the first tag in $str

Parameters
string$str‪HTML tag (The element name MUST be separated from the attributes by a space character! Just whitespace will not do)
bool$preserveCase‪If set, then the tag is NOT converted to uppercase by case is preserved.
Returns
‪string Tag name in upper case
See also
getFirstTag()

Definition at line 243 of file HtmlParser.php.

References $parser.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\setDivTags(), TYPO3\CMS\Core\Html\HtmlParser\splitIntoBlockRecursiveProc(), TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_db(), and TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_rte().

◆ HTMLcleaner()

string TYPO3\CMS\Core\Html\HtmlParser::HTMLcleaner (   $content,
  $tags = [],
  $keepAll = 0,
  $hSC = 0,
  $addConfig = [] 
)

Function that can clean up HTML content according to configuration given in the $tags array.

Initializing the $tags array to allow a list of tags (in this case ,, and ), set it like this: $tags = array_flip(explode(',','b,a,i,u')) If the value of the $tags[$tagname] entry is an array, advanced processing of the tags is initialized. These are the options:

$tags[$tagname] = Array(
'overrideAttribs' => '' If set, this string is preset as the attributes of the tag
'allowedAttribs' => '0' (zero) = no attributes allowed, '[commalist of attributes]' = only allowed attributes. If blank, all attributes are allowed.
'fixAttrib' => Array(
'[attribute name]' => Array (
'set' => Force the attribute value to this value.
'unset' => Boolean: If set, the attribute is unset.
'default' => If no attribute exists by this name, this value is set as default value (if this value is not blank)
'always' => Boolean. If set, the attribute is always processed. Normally an attribute is processed only if it exists
'trim,intval,lower,upper' => All booleans. If any of these keys are set, the value is passed through the respective PHP-functions.
'range' => Array ('[low limit]','[high limit, optional]') Setting integer range.
'list' => Array ('[value1/default]','[value2]','[value3]') Attribute must be in this list. If not, the value is set to the first element.
'removeIfFalse' => Boolean/'blank'. If set, then the attribute is removed if it is 'FALSE'. If this value is set to 'blank' then the value must be a blank string (that means a 'zero' value will not be removed)
'removeIfEquals' => [value] If the attribute value matches the value set here, then it is removed.
'casesensitiveComp' => 1 If set, then the removeIfEquals and list comparisons will be case sensitive. Otherwise not.
)
),
'protect' => '', Boolean. If set, the tag <> is converted to &lt; and &gt;
'remap' => '', String. If set, the tagname is remapped to this tagname
'rmTagIfNoAttrib' => '', Boolean. If set, then the tag is removed if no attributes happened to be there.
'nesting' => '', Boolean/'global'. If set TRUE, then this tag must have starting and ending tags in the correct order. Any tags not in this order will be discarded. Thus '</B><B><I></B></I></B>' will be converted to '<B><I></B></I>'. Is the value 'global' then true nesting in relation to other tags marked for 'global' nesting control is preserved. This means that if <B> and <I> are set for global nesting then this string '</B><B><I></B></I></B>' is converted to '<B></B>'
)

Parameters
string$content‪Is the HTML-content being processed. This is also the result being returned.
array$tags‪Is an array where each key is a tagname in lowercase. Only tags present as keys in this array are preserved. The value of the key can be an array with a vast number of options to configure.
mixed$keepAll‪Boolean/'protect', if set, then all tags are kept regardless of tags present as keys in $tags-array. If 'protect' then the preserved tags have their <> converted to < and >
int$hSC‪Values -1,0,1,2: Set to zero= disabled, set to 1 then the content BETWEEN tags is htmlspecialchar()'ed, set to -1 its the opposite and set to 2 the content will be HSC'ed BUT with preservation for real entities (eg. "&amp;" or "&#234;")
array$addConfigConfiguration array send along as $conf to the internal functions
Returns
‪string Processed HTML content

Definition at line 387 of file HtmlParser.php.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\HTMLcleaner_db(), TYPO3\CMS\Core\Html\RteHtmlParser\runHtmlParserIfConfigured(), TYPO3\CMS\Core\Html\RteHtmlParser\setDivTags(), and TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_db().

◆ HTMLparserConfig()

array TYPO3\CMS\Core\Html\HtmlParser::HTMLparserConfig (   $TSconfig,
  $keepTags = [] 
)

Converts TSconfig into an array for the HTMLcleaner function.

Parameters
array$TSconfig‪TSconfig for HTMLcleaner
array$keepTags‪Array of tags to keep (?)
Returns
‪array

Definition at line 879 of file HtmlParser.php.

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\getKeepTags(), and TYPO3\CMS\Core\Html\RteHtmlParser\runHtmlParserIfConfigured().

◆ prefixRelPath()

string TYPO3\CMS\Core\Html\HtmlParser::prefixRelPath (   $prefix,
  $srcVal,
  $suffix = '' 
)

Internal sub-function for ->prefixResourcePath()

Parameters
string$prefix‪Prefix string
string$srcVal‪Relative path/URL
string$suffix‪Suffix string
Returns
‪string Output path, prefixed if no scheme in input string

Definition at line 805 of file HtmlParser.php.

◆ prefixResourcePath()

string TYPO3\CMS\Core\Html\HtmlParser::prefixResourcePath (   $main_prefix,
  $content,
  $alternatives = [],
  $suffix = '' 
)

Prefixes the relative paths of hrefs/src/action in the tags [td,table,body,img,input,form,link,script,a] in the $content with the $main_prefix or and alternative given by $alternatives

Parameters
string$main_prefix‪Prefix string
string$content‪HTML content
array$alternatives‪Array with alternative prefixes for certain of the tags. key=>value pairs where the keys are the tag element names in uppercase
string$suffix‪Suffix string (put after the resource).
Returns
‪string Processed HTML content

Definition at line 713 of file HtmlParser.php.

◆ removeFirstAndLastTag()

string TYPO3\CMS\Core\Html\HtmlParser::removeFirstAndLastTag (   $str)

◆ split_tag_attributes()

array TYPO3\CMS\Core\Html\HtmlParser::split_tag_attributes (   $tag)

Returns an array with the 'components' from an attribute list. The result is normally analyzed by get_tag_attributes Removes tag-name if found.

The difference between this method and the one in GeneralUtility is that this method actually determines more information on the attribute, e.g. if the value is enclosed by a " or ' character. That's why this method returns two arrays, the "components" and the "meta-information" of the "components".

Parameters
string$tag‪The tag or attributes
Returns
‪array
See also
‪\TYPO3\CMS\Core\Utility\GeneralUtility::split_tag_attributes()

Definition at line 319 of file HtmlParser.php.

◆ splitIntoBlock()

array TYPO3\CMS\Core\Html\HtmlParser::splitIntoBlock (   $tag,
  $content,
  $eliminateExtraEndTags = false 
)

Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->removeFirstAndLastTag() to process the content if needed.

Parameters
string$tag‪List of tags, comma separated.
string$content‪HTML-content
bool$eliminateExtraEndTags‪If set, excessive end tags are ignored - you should probably set this in most cases.
Returns
‪array Even numbers in the array are outside the blocks, Odd numbers are block-content.
See also
splitTags()
removeFirstAndLastTag()

Definition at line 49 of file HtmlParser.php.

References TYPO3\CMS\Core\Utility\GeneralUtility\trimExplode().

Referenced by TYPO3\CMS\Core\Html\RteHtmlParser\divideIntoLines(), TYPO3\CMS\Core\Html\RteHtmlParser\markBrokenLinks(), TYPO3\CMS\Core\Html\RteHtmlParser\removeBrokenLinkMarkers(), TYPO3\CMS\Core\Html\HtmlParser\splitIntoBlockRecursiveProc(), TYPO3\CMS\Core\Html\RteHtmlParser\TS_links_db(), TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_db(), and TYPO3\CMS\Core\Html\RteHtmlParser\TS_transform_rte().

◆ splitIntoBlockRecursiveProc()

string TYPO3\CMS\Core\Html\HtmlParser::splitIntoBlockRecursiveProc (   $tag,
  $content,
$procObj,
  $callBackContent,
  $callBackTags,
  $level = 0 
)

Splitting content into blocks recursively and processing tags/content with call back functions.

Parameters
string$tag‪Tag list, see splitIntoBlock()
string$content‪Content, see splitIntoBlock()
object$procObj‪Object where call back methods are.
string$callBackContent‪Name of call back method for content; "function callBackContent($str,$level) @param string $callBackTags Name of call back method for tags; "function callBackTags($tags,$level)
int$level‪Indent level
Returns
‪string Processed content
See also
splitIntoBlock()

Definition at line 124 of file HtmlParser.php.

References TYPO3\CMS\Core\Html\HtmlParser\getFirstTag(), TYPO3\CMS\Core\Html\HtmlParser\getFirstTagName(), TYPO3\CMS\Core\Html\HtmlParser\removeFirstAndLastTag(), and TYPO3\CMS\Core\Html\HtmlParser\splitIntoBlock().

◆ splitTags()

array TYPO3\CMS\Core\Html\HtmlParser::splitTags (   $tag,
  $content 
)

Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->removeFirstAndLastTag() to process the content if needed.

Parameters
string$tag‪List of tags
string$content‪HTML-content
Returns
‪array Even numbers in the array are outside the blocks, Odd numbers are block-content.
See also
splitIntoBlock()
removeFirstAndLastTag()

Definition at line 159 of file HtmlParser.php.

◆ stripEmptyTags()

string TYPO3\CMS\Core\Html\HtmlParser::stripEmptyTags (   $content,
  $tagList = '',
  $treatNonBreakingSpaceAsEmpty = false,
  $keepTags = false 
)

Strips empty tags from HTML.

Parameters
string$content‪The content to be stripped of empty tags
string$tagList‪The comma separated list of tags to be stripped. If empty, all empty tags will be stripped
bool$treatNonBreakingSpaceAsEmpty‪If TRUE tags containing only   entities will be treated as empty.
bool$keepTags‪If true, the provided tags will be kept instead of stripped.
Returns
‪string the stripped content

Definition at line 1009 of file HtmlParser.php.

◆ stripEmptyTagsIfConfigured()

TYPO3\CMS\Core\Html\HtmlParser::stripEmptyTagsIfConfigured ( string  $value,
array  $configuration 
)
protected

Strips the configured empty tags from the HMTL code.

Definition at line 1031 of file HtmlParser.php.

Member Data Documentation

◆ $caseShift_cache

array TYPO3\CMS\Core\Html\HtmlParser::$caseShift_cache = []
protected

Definition at line 27 of file HtmlParser.php.

◆ VOID_ELEMENTS

const TYPO3\CMS\Core\Html\HtmlParser::VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr'

Definition at line 30 of file HtmlParser.php.