TYPO3CMS  8
 All Classes Namespaces Files Functions Variables Pages
HtmlParser Class Reference
Inheritance diagram for HtmlParser:
RteHtmlParser

Public Member Functions

 splitIntoBlock ($tag, $content, $eliminateExtraEndTags=false)
 
 splitIntoBlockRecursiveProc ($tag, $content, &$procObj, $callBackContent, $callBackTags, $level=0)
 
 splitTags ($tag, $content)
 
 removeFirstAndLastTag ($str)
 
 getFirstTag ($str)
 
 getFirstTagName ($str, $preserveCase=false)
 
 get_tag_attributes ($tag, $deHSC=false)
 
 split_tag_attributes ($tag)
 
 bidir_htmlspecialchars ($value, $dir)
 
 prefixResourcePath ($main_prefix, $content, $alternatives=[], $suffix= '')
 
 prefixRelPath ($prefix, $srcVal, $suffix= '')
 
 caseShift ($str, $caseSensitiveComparison, $cacheKey= '')
 
 compileTagAttribs ($tagAttrib, $meta=[])
 
 HTMLparserConfig ($TSconfig, $keepTags=[])
 
 stripEmptyTags ($content, $tagList= '', $treatNonBreakingSpaceAsEmpty=false, $keepTags=false)
 

Public Attributes

const VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr'
 

Protected Member Functions

 stripEmptyTagsIfConfigured ($value, $configuration)
 

Protected Attributes

 $caseShift_cache = []
 

Detailed Description

Functions for parsing HTML. You are encouraged to use this class in your own applications

Definition at line 25 of file HtmlParser.php.

Member Function Documentation

bidir_htmlspecialchars (   $value,
  $dir 
)

Converts htmlspecialchars forth ($dir=1) AND back ($dir=-1)

Parameters
string$valueInput value
int$dirDirection: forth ($dir=1, dir=2 for preserving entities) AND back ($dir=-1)
Returns
string Output value

Definition at line 667 of file HtmlParser.php.

caseShift (   $str,
  $caseSensitiveComparison,
  $cacheKey = '' 
)

Internal function for case shifting of a string or whole array

Parameters
mixed$strInput string/array
bool$caseSensitiveComparisonIf this value is FALSE, the string is returned in uppercase
string$cacheKeyKey string used for internal caching of the results. Could be an MD5 hash of the serialized version of the input $str if that is an array.
Returns
string Output string, processed private

Definition at line 808 of file HtmlParser.php.

compileTagAttribs (   $tagAttrib,
  $meta = [] 
)

Compiling an array with tag attributes into a string

Parameters
array$tagAttribTag attributes
array$metaMeta information about these attributes (like if they were quoted)
Returns
string Imploded attributes, eg: 'attribute="value" attrib2="value2"' private

Definition at line 839 of file HtmlParser.php.

Referenced by RteHtmlParser\divideIntoLines().

get_tag_attributes (   $tag,
  $deHSC = false 
)

Returns an array with all attributes as keys. Attributes are only lowercase a-z If an attribute is empty (shorthand), then the value for the key is empty. You can check if it existed with isset()

Compared to the method in GeneralUtility::get_tag_attributes this method also returns meta data about each attribute, e.g. if it is a shorthand attribute, and what the quotation is. Also, since all attribute keys are lower-cased, the meta information contains the original attribute name.

Parameters
string$tagTag: $tag is either a whole tag (eg '<TAG option="" attrib="VALUE">') or the parameterlist (ex ' OPTION ATTRIB=VALUE>')
bool$deHSCIf set, the attribute values are de-htmlspecialchar'ed. Should actually always be set!
Returns
array array(Tag attributes,Attribute meta-data)

Definition at line 248 of file HtmlParser.php.

Referenced by RteHtmlParser\divideIntoLines(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_images_rte(), and RteHtmlParser\TS_links_db().

getFirstTag (   $str)

Returns the first tag in $str Actually everything from the beginning of the $str is returned, so you better make sure the tag is the first thing...

Parameters
string$strHTML string with tags
Returns
string

Definition at line 209 of file HtmlParser.php.

Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().

getFirstTagName (   $str,
  $preserveCase = false 
)

Returns the NAME of the first tag in $str

Parameters
string$strHTML tag (The element name MUST be separated from the attributes by a space character! Just whitespace will not do)
bool$preserveCaseIf set, then the tag is NOT converted to uppercase by case is preserved.
Returns
string Tag name in upper case
See also
getFirstTag()

Definition at line 224 of file HtmlParser.php.

Referenced by HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().

HTMLparserConfig (   $TSconfig,
  $keepTags = [] 
)

Converts TSconfig into an array for the HTMLcleaner function.

Parameters
array$TSconfigTSconfig for HTMLcleaner
array$keepTagsArray of tags to keep (?)
Returns
array private

Definition at line 861 of file HtmlParser.php.

Referenced by RteHtmlParser\getKeepTags(), and RteHtmlParser\RTE_transform().

prefixRelPath (   $prefix,
  $srcVal,
  $suffix = '' 
)

Internal sub-function for ->prefixResourcePath()

Parameters
string$prefixPrefix string
string$srcValRelative path/URL
string$suffixSuffix string
Returns
string Output path, prefixed if no scheme in input string private

Definition at line 785 of file HtmlParser.php.

prefixResourcePath (   $main_prefix,
  $content,
  $alternatives = [],
  $suffix = '' 
)

Prefixes the relative paths of hrefs/src/action in the tags [td,table,body,img,input,form,link,script,a] in the $content with the $main_prefix or and alternative given by $alternatives

Parameters
string$main_prefixPrefix string
string$contentHTML content
array$alternativesArray with alternative prefixes for certain of the tags. key=>value pairs where the keys are the tag element names in uppercase
string$suffixSuffix string (put after the resource).
Returns
string Processed HTML content

Definition at line 690 of file HtmlParser.php.

removeFirstAndLastTag (   $str)

Removes the first and last tag in the string Anything before the first and after the last tags respectively is also removed

Parameters
string$strString to process
Returns
string

Definition at line 192 of file HtmlParser.php.

Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().

split_tag_attributes (   $tag)

Returns an array with the 'components' from an attribute list. The result is normally analyzed by get_tag_attributes Removes tag-name if found.

The difference between this method and the one in GeneralUtility is that this method actually determines more information on the attribute, e.g. if the value is enclosed by a " or ' character. That's why this method returns two arrays, the "components" and the "meta-information" of the "components".

Parameters
string$tagThe tag or attributes
Returns
array private
See also
::split_tag_attributes()

Definition at line 297 of file HtmlParser.php.

splitIntoBlock (   $tag,
  $content,
  $eliminateExtraEndTags = false 
)

Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->removeFirstAndLastTag() to process the content if needed.

Parameters
string$tagList of tags, comma separated.
string$contentHTML-content
bool$eliminateExtraEndTagsIf set, excessive end tags are ignored - you should probably set this in most cases.
Returns
array Even numbers in the array are outside the blocks, Odd numbers are block-content.
See also
splitTags(), removeFirstAndLastTag()

Definition at line 51 of file HtmlParser.php.

References GeneralUtility\trimExplode().

Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().

splitIntoBlockRecursiveProc (   $tag,
  $content,
$procObj,
  $callBackContent,
  $callBackTags,
  $level = 0 
)

Splitting content into blocks recursively and processing tags/content with call back functions.

Parameters
string$tagTag list, see splitIntoBlock()
string$contentContent, see splitIntoBlock()
object$procObjObject where call back methods are.
string$callBackContentName of call back method for content; "function callBackContent($str,$level) @param string $callBackTags Name of call back method for tags; "function callBackTags($tags,$level)
int$levelIndent level
Returns
string Processed content
See also
splitIntoBlock()

Definition at line 123 of file HtmlParser.php.

References HtmlParser\getFirstTag(), HtmlParser\getFirstTagName(), HtmlParser\removeFirstAndLastTag(), and HtmlParser\splitIntoBlock().

splitTags (   $tag,
  $content 
)

Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->removeFirstAndLastTag() to process the content if needed.

Parameters
string$tagList of tags
string$contentHTML-content
Returns
array Even numbers in the array are outside the blocks, Odd numbers are block-content.
See also
splitIntoBlock(), removeFirstAndLastTag()

Definition at line 157 of file HtmlParser.php.

Referenced by RteHtmlParser\TS_images_rte().

stripEmptyTags (   $content,
  $tagList = '',
  $treatNonBreakingSpaceAsEmpty = false,
  $keepTags = false 
)

Strips empty tags from HTML.

Parameters
string$contentThe content to be stripped of empty tags
string$tagListThe comma separated list of tags to be stripped. If empty, all empty tags will be stripped
bool$treatNonBreakingSpaceAsEmptyIf TRUE tags containing only   entities will be treated as empty.
bool$keepTagsIf true, the provided tags will be kept instead of stripped.
Returns
string the stripped content

Definition at line 992 of file HtmlParser.php.

stripEmptyTagsIfConfigured (   $value,
  $configuration 
)
protected

Strips the configured empty tags from the HMTL code.

Parameters
string$value
array$configuration
Returns
string

Definition at line 1018 of file HtmlParser.php.

Member Data Documentation

$caseShift_cache = []
protected

Definition at line 30 of file HtmlParser.php.

const VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr'

Definition at line 33 of file HtmlParser.php.