phpDocumentor php4-html-dom
[ class tree: php4-html-dom ] [ index: php4-html-dom ] [ all elements ]

Procedural File: php4-html-dom.php

Source Location: /php4-html-dom.php

Page Details

DOM based light weight/high speed HTML parser compatible with PHP4 and up

change-history:


  • 0.10.0
  • - Corrected parsing of boolean attributes
  • - Added htmlNode->addSibling(). Adds a sibling right after the node the methodis called on.
  • - Added htmlNode->getText(). Returns the content of the text nodes (ttText).
  • - Added Parser Option poNone: Default to no options
  • - Added htmlNode->getInnerHtml(). Returns the HTML of all the children.
  • - Corrected indexing in node->insertChildNode()

  • 0.9.0
  • - Added htmlNode->insertParentNode(). Injects a new parent between the node and the existing parent.
  • - Added htmlNode->setText(). Clears all the child nodes and sets the text within a tag
  • - Added htmlNode->inserChildNode() to add a child at a given position
  • - Added Parser Option poTrimText: Calls trim() on the text elements befor they are analyzed for content
  • - Added Parser Option poRemoveCRLF: Removes and combination of CRLF, LFCR, CR or LF
  • - Added htmlParser->ParseOptions property

  • 0.8.0
  • - Added walkDown() to travers DOM and calling callback
  • - Added setAttribute() to set an attribute. Attribute name is case insensitive
  • - Corrected hierarchy when finding unclosed open tags

  • 0.7.2
  • - Corrected line number and position
  • - Corrected comment start to not include --

  • 0.7.1
  • - Added retention of line number and position of HTML node

  • 0.7.0
  • - Added retention of quotes for attributes

  • 0.6.0
  • - Corrected compatibility with php 4.3.9 by adjusting object reference calls

  • 0.5.0
  • - Renamed parser->getElementByName() to parser->getElementByTagName()
  • - Renamed parser->getElementsByName() to parser->getElementsByTagName()
  • - Added parser->getElementByName() to look for name attribute
  • - Added parser->getElementsByName() to look for name attribute

  • 0.4.0
  • - Added node->findNodeByAttribute() and node->findNodesByAttribute()
  • - Added parser->getElementById() and parser->getElementsById()

  • 0.3.0
  • - Added parser->getElementByName() and parser->getElementsByName()
  • - Corrected boolean attributes
  • - Added node->getAttribute() to give case insensitive access to attributes

  • 0.2.0
  • - Added parsing of attributes
  • - Added returning html of DOM structure parser->getHtml()
  • - Added comments for PhpDocumentor

  • 0.1.0
  • - Created initial version focusing on basics: parser, tag identification, tag structure, DOM

Author:  Adrian Meyer <adrian.meyer@unc.edu>
Version:  0.9.0
License:  Freeware
Classes
Class Description
htmlParser HTML parser class
htmlNode HTML node class
Constants
gHtmlParser  [line 65]

gHtmlParser = 'html-parser'

Top level key used for all globals

API Tags:
Global:  string 0: gHtmlParser


[ Top ]

rootTagLength  [line 136]

rootTagLength = strlen('<'.rootTagName.'>')

Length of root tag to adjust character position on node->ParseStartPosition and node->ParseEndPosition

API Tags:
Global:  string 0: rootTagLength
See:  htmlNode::$ParseEndPosition
See:  htmlNode::$ParseStartPosition


[ Top ]

rootTagName  [line 129]

rootTagName = 'parserRoot'

Name used as root and to wrap passed HTML.

API Tags:
Global:  string 0: rootTagName


[ Top ]


Globals
array   $GLOBALS[gHtmlParser]['parseModes'] [line 92]

Modes the parser is set to while looping through the HTML

  • pmInTag: We are in a tag between < and >
  • pmComment: We are parsing in a comment between <!-- -->
  • pmNormal: We are parsing outside of tags
  • pmScript: We are parsing inside script

Default value:  array( 'pmComment', 'pmInTag', 'pmNormal', 'pmScript' )

[ Top ]

array   $GLOBALS[gHtmlParser]['parseOptions'] [line 102]

Parser options controlling how different szenarios are handeled. These options are passed with an OR operator. Example: htmlNode->parseHtml( $myHtml, poRemoveCRLF | poTrimText );

  • poRemoveCRLF: CR or CRLF are removed from the text and replaced with SPACE
  • poTrimText: The text elements are trimmed
  • poNone: None of the parse options are set

Default value:  array( 0 => 'poNone', 1 => 'poRemoveCRLF', 2 => 'poTrimText' )

[ Top ]

array   $GLOBALS[gHtmlParser]['tagProperties'] [line 112]

Tag properties used when analysing tag names, types and data

  • tName: Name of tag as string. !-- is used for comments. !DOCTYPE is used for document type information
  • tType: Type of tag using tag type globals
  • tData: Data portion of tag. This can be attributes (before parseAttribues() is called), comment of document type information

Default value:  array( 'tName', 'tType', 'tData' )

[ Top ]

array   $GLOBALS[gHtmlParser]['tagTypes'] [line 81]

Tag types for the HTML nodes

  • ttRoot: Root node as specified in rootTagName used during parsing
  • ttUnknown: Fallback type if tag cannot be identified
  • ttComment: Comment tag in the format of <!-- comment -->
  • ttDocType: Document type tag in the format of <!DOCTYPE ...>. Identification of this tag is case insensitive
  • ttText: Tag used to store plain text
  • ttStart: Tag type used during parsing when the format is <name> containing no / at the beginning or end
  • ttEnd: Tag type used during parsing when the format is </name>. The parser will try to find the matching start tag and change it to ttNormal
  • ttNormal: Tag type used for "normal" hierarchical tags in the format of <tagName></tagName>
  • ttSingle: Tag type used for tags with a / at the end. Example: <br/>
  • ttSimple: Tag type used for tags that looked like start tags but did not have an end tag. Example: <hr>

Default value:  array( 'ttRoot', 'ttUnknown', 'ttComment', 'ttDocType', 'ttText', 'ttStart', 'ttEnd', 'ttNormal', 'ttSingle', 'ttSimple' )

[ Top ]

array   $GLOBALS[gHtmlParser]['walkAbort'] [line 122]

walkDown() resul constants to control the continuation or abort of DOM walk

  • wdContinue: Continue to walk the DOM
  • wdAbortBranch: Abort walking the current branch but continue otherwise
  • wdAbort: Abort the walk immediately

Default value:  array( 'wdContinue', 'wdAbortBranch', 'wdAbort' )

[ Top ]



Documentation generated on Mon, 29 Mar 2010 09:09:08 -0400 by phpDocumentor 1.4.3