The DOMIT! XML Parser Manual


Table of Contents

1. Overview of XML
1. Intro to XML
2. Types of XML Content
2.1. XML Elements
2.2. XML Attributes
2.3. Character Data
2.3.1. Illegal Characters
2.3.2. CDATA Sections
2.4. Comments
2.5. Processing Instructions
2.6. Document Type Declarations
2. Overview of the DOM
1. Intro to the DOM
2. Types of DOM Nodes
2.1. Document Nodes
2.2. Element Nodes
2.3. Attribute Nodes
2.4. Text Nodes
2.5. CDATA Section Nodes
2.6. Comment Nodes
2.7. Processing Instruction Nodes
2.8. Document Type Declarations
3. The Structure of a DOM Document
3.1. Child Nodes
3.1.1. First Child
3.1.2. Last Child
3.2. Parent Nodes
3.3. Sibling Nodes
3.3.1. Previous Sibling
3.3.2. Next Sibling
3.4. Attribute Nodes
3.5. Owner Document
3.6. Document Element
3. Installing DOMIT
1. What is DOMIT!?
2. Installing DOMIT!
3. Installing DOMIT! Lite
4. Including the DOMIT! Library in your Scripts
4. Loading a DOMIT_Document
1. Instantiating and Populating a DOMIT_Document
1.1. Instantiating a DOMIT_Document
1.2. parseXML: Populating a DOMIT_Document from a string variable
1.3. loadXML: Populating a DOMIT_Document from a file or url
1.4. useSAXY: Specifiying a SAX parser
1.5. Determining the base SAX parser
2. Optional Settings for Loading XML Data
2.1. useHTTPClient: Forcing loadXML to use an HTTP Client
2.2. setConnection: Manually specifying HTTP connection parameters
2.3. setAuthorization: Using basic HTTP authorization with your connection
2.4. setProxyConnection: Retrieving XML data through a proxy server
2.5. setProxyAuthorization: Using basic HTTP authorization with your proxy
2.6. preserveWhiteSpace
2.7. appendEntityTranslationTable
3. Error Handling During and After Loading an XML Document
3.1. resolveErrors
3.2. getErrorCode and getErrorString
3.3. DOMIT_DOMException::setErrorHandler
3.4. DOMIT_DOMException::setErrorMode
3.5. DOMIT_DOMException::setErrorLog
5. Traversing a Document and Extracting Data
1. The Document Element Node
2. Displaying a Node as Text
2.1. toString
2.2. toNormalizedString
2.3. expandEmptyElementTags
3. Obtaining Node Type, Name, and Value
4. Traversing a DOM Tree
4.1. The childNodes array, hasChildNodes, and childCount
4.2. firstChild
4.3. lastChild
4.4. nextSibling
4.5. previousSibling
4.6. parentNode
4.7. ownerDocument
5. Extracting Character Data
5.1. nodeValue
5.2. getData
5.3. getText
5.4. getLength
5.5. substringData
6. Accessing Attributes
6.1. hasAttribute
6.2. hasAttributes
6.3. getAttribute
6.4. getAttributeNode and getValue
6.5. The attributes Keyword and Named Node Maps
6.5.1. The attributes Keyword
6.5.2. getLength, item, and getName
7. Accessing the XML Prolog
7.1. getXMLDeclaration
7.2. Accessing the Document Type Declaration
6. Creating and Modifying a DOM Document
1. Creating Nodes
1.1. createElement
1.2. createTextNode
1.3. createCDATASection
1.4. createAttribute
1.5. createComment
1.6. createProcessingInstruction
2. Appending Nodes
3. Setting the Document Element
4. Setting Attributes
4.1. setAttribute
4.2. setAttributeNode
5. Creating the cdlibrary XML Using DOMIT!
6. Inserting Nodes
7. Replacing Nodes
8. Removing Nodes
9. Removing Attributes
9.1. removeAttribute
9.2. removeAttributeNode
10. Setting Character Data
10.1. setText
10.1.1. setText When Called from an Element
10.2. splitText
10.3. normalize
10.4. appendData
10.5. insertData
10.6. replaceData
10.7. deleteData
7. Saving a DOM Document
8. Miscellaneous DOM Features
1. cloneNode
2. getElementByID
2.1. getElementByID and Strict vs. Tolerant mode
3. getElementsByTagName
4. Using NodeLists
4.1. getLength and item
4.2. appendNode and removeNode
4.3. childNodesAsNodeList
5. importNode
9. Custom DOMIT! Methods
1. getVersion
2. Searching for Nodes
2.1. getElementsByPath
2.1.1. Absolute Path Search
2.1.2. Relative Path Search
2.1.3. Variable Path Search
2.1.4. Returning a Single Node Instead of a Node List
2.2. getElementsByAttribute
2.3. getNodesByNodeType
2.4. getNodesByNodeValue
3. XML to and from Arrays
3.1. toArray
3.2. DOMIT_Utilities::fromArray
4. The nodetools Library
4.1. nodetools::parseAttributes
4.2. nodetools::moveUp
4.3. nodetools::moveDown
4.4. nodetools::nodeExists
4.5. nodetools::fromPath
10. XML Namespaces
1. Introduction to XML Namespaces
1.1. URIs, Namespace Prefixes, and Namespace Declarations
1.2. Default Namespace
1.3. Local Name
1.4. Qualified Name
1.5. DOM and XML Namespaces
2. DOMIT! and XML Namespaces
2.1. setNamespaceAwareness
2.2. declareNamespace
2.3. declareDefaultNamespace
2.4. getNamespaceDeclarationsInScope
2.5. getDefaultNamespaceDeclaration
2.6. copyNamespaceDeclarationsLocally
2.7. createElementNS
2.8. getElementsByTagNameNS
2.9. createAttributeNS
2.10. hasAttributeNS and getAttributeNS
2.11. setAttributeNS
2.12. getAttributeNodeNS and setAttributeNodeNS
2.13. removeAttributeNS
11. XPath
1. XPath Overview
2. selectNodes
12. DOMIT! Roadmap
13. Contributing to DOMIT!

Chapter 1. Overview of XML

1. Intro to XML

XML (Extensible Markup Language) is a standard for encapsulating textual data. XML is strictly structured, but also human readable.

Having a strictly defined format makes it easier for computer programs (i.e., XML parsers) to build, extract, manipulate, and exchange the data. Since XML is written in human readable text and not binary format, it is much more convenient for people to work with on a daily basis.

This simple balance of structure and readability is one of the primary reasons that XML has seen such widespread adoption over recent years.

The following description of a person's cd music collection is one possible example of XML formatted text:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

As should be apparent from the example, XML has a tree-like structure. This is referred to as a Document.

2. Types of XML Content

There are a varierty of different ways of demarcating content in an XML Document. The following sections presents a brief overview of some of these ways.

2.1. XML Elements

An Element in XML is a type of content whose primary purpose is to contain other content. Elements are like bookends, and therefore consist of two parts: a start tag and an end tag.

Take a look at the first line of text in the cd library example: <cdlibrary>.

This is an example of a start tag. A start tag always:

  • begins with a left angle bracket: <

  • ends with a right angle bracket: >

  • has a name: cdlibrary

At the bottom of the XML document is an end tag, which has a slightly different format: </cdlibrary>.

An end tag always:

  • begins with a left angle bracket and a forward slash: </

  • ends with a right angle bracket: >

  • has a name identical to its matching start tag: cdlibrary

An XML element can contain other types of XML content, including other elements. For example, the <person> element below contains a single <name> element:

<person>
  <name>John Heinstein</name>
</person>

It is possible to have an element containing no XML content. This is referred to as an empty element. There is a shorthand notation for representing an empty element:

<someEmptyElement/>

The longhand equivalent of this is:

<someEmptyElement></someEmptyElement>

2.2. XML Attributes

Take a look at the first cd element in cdlibrary:

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

Some additional information is present in the start tag: discid="bb0c3c0c". This is a type of XML content referred to as an Attribute. An attribute is used to store short, simple units of text.

An attribute always:

  • contains a unique named key, such as: discid

  • followed by an equal sign: =

  • followed by a value contained in either single or double quotes: "bb0c3c0c"

There can be multiple attributes in any start tag, as long as the attribute names are unique. For example:

<point x='10' y='35'/>

2.3. Character Data

Textual XML content not stored in attributes is referred to as Character Data. Character data is always contained within elements, as we can see in the following example from the cdlibrary document

<name>Robbie Fulks</name>

2.3.1. Illegal Characters

There are two reserved characters which cannot be present in valid XML character data. These are the ampersand character (&) and the left angle bracket (<).

If either of these characters need to be present in XML text, it must be escaped. This is done by substituting the entity equivalent of the character.

  • The entity equivalent of the ampersand (&) is the string &amp;

  • The entity equivalent of the left angle bracket (<) is the string &lt;

To represent the string x <= y + 1 as character data, for example, one must escape the left angle bracket:

<relationship>x &lt;= y + 1</relationship>

Note: To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as &apos; and the double-quote character (") as &quot;

2.3.2. CDATA Sections

Sometimes your XML data contains many illegal characters that must be escaped -- such as when you need to store HTML content within XML content:

<htmlcode>&lt;img src="http://www.someurl.com/pic.jpg" /></htmlcode>

It is not only work intensive to escape many illegal characters, but the readability of your document suffers.

There is a special construct called a CDATA Section that is reserved for demarcating text data that can be written in its literal form. The following example rewites the above <htmlcode> example as a CDATA Section:

<htmlcode><![CDATA[<img src="http://www.someurl.com/pic.jpg"" />]]></htmlcode>

As you can see there is no need to escape the left angle bracket beginning the <img tag.

A CDATA Section always:

  • begins with the string <![CDATA[

  • ends with the string ]]>

Note: if the string ]]> is contained within a CDATA Section, the right angle bracket (>)must be escaped as &gt; so that it will not be confused with the terminating CDATA Section string.

2.4. Comments

An XML Comment is a construct for adding remarks to your XML. It is similar to an HTML comment in that it:

  • begins with the string <!--

  • ends with the string -->

A comment could be added to the cdlibrary example like this:

<?xml version="1.0"?>
<cdlibrary>
  <!-- Not many cds left after I got robbed -->
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

2.5. Processing Instructions

An XML Processing Instruction indicates to an application that it must perform some processing operation.

Every XML document is required to begin with a special type of processing instruction know as an XML Declaration (in practice, the XML declaration is often omitted):

<?xml version="1.0"?>

A processing instruction:

  • begins with the string <?

  • followed by the target of the operation (the application to which the operation is to be directed): e.g. xml

  • followed by the data for the application to process: e.g. version="1.0"

  • ending with the string ?>

Another example of a processing instruction is found in the declaration of PHP code within an HTML page:

<?php  
//code here
?>

The target "php" informs a web server to process the subsequent data with a PHP interpreter rather than as HTML code.

2.6. Document Type Declarations

A Document Type Declaration is a mechanism for defining what is an acceptable structure for an XML document. A validating XML parser can compare an XML document to its DTD and determine whether it is valid or not.

A DTD follows the XML declaration and comes before any actual XML data.

The following is an example of a DTD for an XML document containing a single element named "foo":

<!DOCTYPE foo [
  <!ELEMENT foo (#PCDATA)>
]>

Chapter 2. Overview of the DOM

1. Intro to the DOM

When one speaks of the DOM (Document Object Model), one is not is not referring to XML per se. Rather, the DOM is one of a number of different approaches to conceptualizing, parsing, and interacting with XML content.

The DOM processes XML by creating an object called a Node out of each unit of content in an XML document. Nodes are assembled into a hierarchical collection called a DOM Document.

The entire XML document is held in memory at once, which allows the collection of nodes to be traversed easily. The DOM approach can, however, be memory intensive for larger XML documents.

The DOM also describes a number of methods and properties that allow the user to interact programatically with the nodes of a DOM Document.

We will be examining some of these methods and properties in the following tutorial.

2. Types of DOM Nodes

The DOM specification delineates a number of different kinds of nodes, each of which correspond to the different kinds of XML content. A set of three node properties are used to distinguish one kind of node from another:

  • node type: an integer from 1 to 12 specifying the type of node

  • node name: the name of the node, can have various values depending on node type

  • node value: the value of the node, can various values depending on node type

2.1. Document Nodes

A Document Node represents the DOM document itself -- the entire collection of nodes in a. It has:

  • a node type of 9

  • a node name of #document

  • a node value of null

2.2. Element Nodes

An Element Node represents an XML Element. Take for instance the following <fullname> element:

<fullname>John Heinstein</fullname>

This element has:

  • a node type of 1

  • a node name of fullname

  • a node value of null

2.3. Attribute Nodes

An Attribute Node represents an XML Attribute. Take for instance the following serial attribute:

<item serial="123456"/>

This attribute has:

  • a node type of 2

  • a node name of serial

  • a node value of 123456

2.4. Text Nodes

A Text Node represents XML Character Data that is not specified as a CDATA Section. Take for instance the text content bounded by the <fullname> element:

<fullname>John Heinstein</fullname>

This text node has:

  • a node type of 3

  • a node name of #text

  • a node value of John Heinstein

2.5. CDATA Section Nodes

A CDATA Section Node represents XML Character Data that is specified as a CDATA Section. Take the following CDATA Section:

<htmlcode><![CDATA[<img src="http://www.someurl.com/pic.jpg"" />]]></htmlcode>

This CDATA Section has:

  • a node type of 4

  • a node name of #cdata-section

  • a node value of <img src="http://www.someurl.com/pic.jpg"" />

2.6. Comment Nodes

A Comment Node represents an XML comment. Take the following XML comment:

<!-- Not many cds left after I got robbed -->

This comment node has:

  • a node type of 8

  • a node name of #comment

  • a node value of Not many cds left after I got robbed

2.7. Processing Instruction Nodes

A Processing Instruction Node represents an XML processing instruction. The most common processing instruction that you will find in a DOM Document is the XML Declaration:

<?xml version="1.0"?>

This processing instruction node has:

  • a node type of 7

  • a node name of xml

  • a node value of version="1.0"

2.8. Document Type Declarations

A Document Type Declaration represents an XML document type declaration.

DOMIT! is a non-validating parser and therefore does not check the validity of an XML document against the DTD. It simply stores a string representation of the DTD.

3. The Structure of a DOM Document

The nodes of a DOM Document are structured as a tree of branching nodes. The terminology to describe the relationship of these nodes is similar to how we would describe the relationship between individuals in a family tree.

Let us use the cdlibrary XML to illustrate this:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

3.1. Child Nodes

All nodes that are direct descendants of a node are referred to as its Child Nodes.

Only nodes of type element are permitted to contain a child nodes collection. Children themselves, however, can be of various node types, including element nodes, text and CDATA Section nodes, and comment nodes.

In the cdlibrary example:

  • the <cdlibrary> element contains three child nodes of type element (the three cd nodes)

    <?xml version="1.0"?>
    <cdlibrary>
      <cd discid="bb0c3c0c">
        <name>Robbie Fulks</name>
        <title>Couples in Trouble</title>
      </cd>
      <cd discid="9b0ce70c">
        <name>Richard Thompson</name>
        <title>Mock Tudor</title>
      </cd>
      <cd discid="cf11720f">
        <name>Keller Williams</name>
        <title>Laugh</title>
      </cd>
    </cdlibrary>
  • each <cd> node contains two child nodes of type element (a <name> node and a <title> node)

    <cd discid="bb0c3c0c">
      <name>Robbie Fulks</name>
      <title>Couples in Trouble</title>
    </cd>
  • each <name> node contains one child node of type text

    <name>Richard Thompson</name>
  • each <title> node contains one child node of type text

    <title>Laugh</title>

A child node is referred to by its numerical index in the child nodes collection. The first child node is generally assigned an index of 1, although for technical reasons some DOM implementations will start at 0.

If an element contains no children, it will still have an child nodes collection (that is empty).

Note: Attribute nodes are not included in the child nodes collection. These are in a separate collection reserved specifically for attributes.

3.1.1. First Child

The First Child is a DOM property that refers to the first child node in a child nodes collection.

In our cdlibrary example, the first child of each of the <cd> nodes is the element node <name>.

<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

If an element contains no child nodes, the first child is null.

3.1.2. Last Child

The Last Child is a DOM property that refers to the last child node in the child nodes collection.

In our cdlibrary example, the last child of each of the <cd> nodes is the element node <title>.

<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

If an element contains no child nodes, the last child is null.

3.2. Parent Nodes

In the same way that one can travel down the hierarchy of a DOM document via the child nodes collection, the DOM specifies a way to travel up the hierarchy. The ancestor of any node is referred to as its Parent Node.

In our cdlibrary example:

  • the parent node of each <cd> element is the <cdlibrary> element

  • the parent node of each <name> element is its containing <cd> element

  • the parent node of each <title> element is its containing <cd> element

  • the parent node of the text contained in the <name> element is the <name> element

  • the parent node of the text contained in the <title> element is the <title> element

Note: Attributes do not contain a reference to parent nodes.

3.3. Sibling Nodes

The DOM specifies an explicit relationship between nodes that occupy the same level of a DOM tree. These nodes are referred to as Sibling Nodes.

One might think of the relationship between sibling nodes as the links in a chain. Each node knows about the node immediately preceding it and the node immediately following it.

3.3.1. Previous Sibling

The node immediately preceding any node in a sibling chain is referred to as its Previous Sibling.

In our cdlibrary example:

  • the previous sibling of each <title> element is the <name> element

  • the previous sibling of each <name> element is null

Note: If a node has no previous sibling, there still exists a previous sibling reference, but it is null.

3.3.2. Next Sibling

The node immediately following any node in a sibling chain is referred to as its Next Sibling.

In our cdlibrary example:

  • the next sibling of each <name> element is the <title> element

  • the next sibling of each <title> element is null

3.4. Attribute Nodes

Each element node contains an Attributes list, or a reference to the collection of attributes assigned to it. In the following example, the <item> element contains a list of five attributes:

<item desc="post" material="steel" length="120" diameter="5" price="0.75"/> 

These attributes can be accessed either by name or numerical index.

3.5. Owner Document

Each node in a DOM Document -- with the exclusion of attribute nodes -- contains a reference to the DOM Document that contains it. This is referred to as the Owner Document property of a node.

3.6. Document Element

The root element of a DOM Document is always referred to as the Document Element.

In our cdlibrary example, the document element would be the <cdlibrary> node.

Chapter 3. Installing DOMIT

1. What is DOMIT!?

DOMIT! is an XML parser that is mostly consistent with the Document Object Model (DOM) Level 2 specification.

DOMIT! is not an extension; it is written purely in PHP and should work in any PHP 4 or 5 environment, regardless of restrictions put in place by your web hosting provider.

It has been designed for speed and ease of use. However, because DOMIT! is composed of interpreted rather than compiled code, you may see sluggish performance with large XML files on a low-memory server.

DOMIT! must be used in conjunction with a SAX parser. By default, you have the option of using either the Expat parser (available with most later distributions of PHP) or the SAXY parser - another purely PHP-based parser developed by Engage Interactive.

As of version 0.9, DOMIT! now includes a lightweight version named DOMIT! Lite, which is slightly faster, especially for larger documents. However, it does not handle parsing of the xml prolog, processing instructions, comments, and certain other functionality.

As of version 0.96, DOMIT! has support for XML namespaces.

Version 0.98 brings PHP5 compatability.

2. Installing DOMIT!

Since DOMIT! is not an extension, it requires no special setup on your web server. You will, however, need to have the following files present on your server filesystem:

  • xml_domit_include.php - include file for DOMIT!, ensures that include paths are resolved properly.

  • xml_domit_shared.php - shared code for DOMIT! and DOMIT! Lite

  • xml_domit_parser.php - the main DOMIT! php file.

  • xml_domit_utilities.php - required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Document.

  • xml_domit_getelementsbypath.php - required if you would like to search for elements in your DOMIT_Document using a path-based syntax.

  • xml_domit_nodemaps.php - data structures that contain collections of nodes

  • xml_domit_nodetools.php - a collection of tools to assist in XML processing

  • xml_domit_cache.php - simple caching class for DOMIT! and DOMIT! Lite documents

  • xml_saxy_parser.php - required if you would like to use the SAXY parser with DOMIT! instead of the Expat parser.

  • xml_domit_doctor.php - class for repairing malformed xml

  • xml_domit_xpath.php - experimental support for XPath queries

  • php_file_utilities.php - generic file input / output utilities

  • php_http_client_generic.php - generic http client class

  • php_http_client_include.php - include file for http client class

  • php_http_connector.php - helper class for php_http_client

  • php_http_exceptions.php - http exceptions class

  • php_http_proxy.php - http proxy class

3. Installing DOMIT! Lite

If you wish to use DOMIT! Lite, a leaner and somewhat faster (although fewer-featured) version of DOMIT!, you will require the following files:

  • xml_domit_lite_include.php - include file for DOMIT! Lite, ensures that include paths are resolved properly.

  • xml_domit_shared.php - shared code for DOMIT! and DOMIT! Lite.

  • xml_domit_lite_parser.php - the main DOMIT! Lite php file.

  • xml_domit_utilities.php - required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Lite_Document.

  • xml_domit_getelementsbypath.php - required if you would like to search for elements in your DOMIT_Lite_Document using a path-based syntax.

  • xml_domit_nodemaps.php - data structures that contain collections of nodes

  • xml_domit_cache.php - simple caching class for DOMIT! and DOMIT! Lite documents

  • xml_saxy_lite_parser.php - required if you would like to use the SAXY Lite parser with DOMIT! Lite instead of the Expat parser.

  • xml_domit_doctor.php - class for repairing malformed xml

  • php_file_utilities.php - generic file input / output utilities

  • php_http_client.php - generic http client class

  • php_http_client_include.php - include file for http client class

  • php_http_connector.php - helper class for php_http_client

  • php_http_exceptions.php - http exceptions class

  • php_http_proxy.php - http proxy class

4. Including the DOMIT! Library in your Scripts

To implement DOMIT! in your PHP scripts, include the file xml_domit_include.php:

require_once('somepath/xml_domit_include.php');

To implement DOMIT! Lite in your PHP scripts, include the file xml_domit_lite_include.php:

require_once('somepath/xml_domit_lite_include.php');

Chapter 4. Loading a DOMIT_Document

1. Instantiating and Populating a DOMIT_Document

In DOMIT!, a DOM Document is represented by the DOMIT_Document class.

1.1. Instantiating a DOMIT_Document

You create an instance of the DOMIT_Document class in the same way as any other PHP class:

$cdCollection =& new DOMIT_Document();

A DOMIT! Lite document is instantiated like this:

$cdCollection =& new DOMIT_Lite_Document();

Once a document has been instantiated, it is ready to be populated with XML data.

Note: to ensure PHP 4 backwards compatability, it is necessary to include an ampersand (&) symbol after the equal sign when returning a reference to a DOMIT_Document or any other DOMIT! object.

1.2. parseXML: Populating a DOMIT_Document from a string variable

If we wanted to create a DOMIT_Document out of a PHP string, we would use the parseXML method. Take for instance, the cd collection XML described in the previous section:

$cdCollection =& new DOMIT_Document(); //instantiate document

//create string variable with XML text
$cdCollectionString = "<?xml version="1.0"?><cdlibrary><cd discid=\"bb0c3c0c\">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title></cd>
  <cd discid=\"9b0ce70c\"><name>Richard Thompson</name>
  <title>Mock Tudor</title></cd><cd discid=\"cf11720f\">
  <name>Keller Williams</name>
  <title>Laugh</title></cd>
  </cdlibrary>";

//use parseXML method to populate document
$success = $cdCollection->parseXML($cdCollectionString, true); //parse document

parseXML returns true if the parsing is successful

1.3. loadXML: Populating a DOMIT_Document from a file or url

The loadXML method of DOMIT_Document is used to load an XML string from a file or url. It uses an identical syntax to the parseXML method:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

The above example parses a file from the file system.

To parse an url, you would specify the full HTTP address as the first parameter:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml");

1.4. useSAXY: Specifiying a SAX parser

DOMIT! relies on an underlying SAX parser to parse XML data. You have the choice of one of two SAX parsers:

  • Expat is a C-based SAX parser written by James Clark that comes bundled with most later distributions of PHP.

  • SAXY is a pure PHP SAX parser written by Engage Interactive that comes bundled with DOMIT!

The second parameter of both parseXML and loadXML allows you to specify a SAX parser. The useSAXY parameter is a boolean whose default value is true. Specifying false will check whether Expat is available and use it to parse and pass the XML data to DOMIT!:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml", false); //Expat specified

If Expat cannot be detected, DOMIT! will revert to SAXY for its parsing.

1.5. Determining the base SAX parser

To determine whether Expat or SAXY was used to generate a DOMIT_Document, you can use the parsedBy method:

$saxParser = $xmldoc->parsedBy();

The parsedBy method returns a string with a value of either EXPAT or SAXY.

2. Optional Settings for Loading XML Data

Sometimes the default DOMIT! mechanism for populating a DOMIT!_Document is insufficient. This is particularly true when retrieving XML data from a remote location.

By default, DOMIT! uses the PHP function get_file_contents or standard PHP file input streams to retrieve the contents of an XML file. However, both of these approaches can fail when passed a remote URL as the location of the XML file to parsed.

A number of additional options exist to deal with these possibilities.

2.1. useHTTPClient: Forcing loadXML to use an HTTP Client

As of version 1.0, DOMIT! comes bundled with the php_http_client library, written by Engage Interactive. With the useHTTPClient method, DOMIT! can be forced to establish a standard HTTP connection to the web server hosting the XML file:

$cdCollection =& new DOMIT_Document();

//specify that an HTTP client should be used to retrieve XML
$xmldoc->useHTTPClient(true); 

//call loadXML method as usual
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false); 

The HTTP connection will be attempted on port 80.

2.2. setConnection: Manually specifying HTTP connection parameters

If you need to establish an HTTP connection to retrieve your XML data, but the useHTTPClient method does not provide enough flexibility, the setConnection method of DOMIT_Document can be used to manually set the parameters of the connection.

$cdCollection =& new DOMIT_Document();
$xmldoc->setConnection('http://www.engageinteractive.com', '/', '955');

//call loadXML method as usual
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);

In the above example, an HTTP connection will be established on port 955 of host http://www.engageinteractive.com. You can also use a raw IP address for the host, such as http://198.162.0.10

Note that you can also pass in a user name and password to the setConnection method, if you must use HTTP Authorization to establish your connection. For more about HTTP Authorization, please see the entry on the setAuthorization method.

2.3. setAuthorization: Using basic HTTP authorization with your connection

The HTTP specification allows for a basic (i.e., not particularly secure) type of authorization called HTTP Authorization. If the XML file that you require is protected by this sort of authentication, you can use the setAuthorization method of DOMIT!.

setAuthorization is used in conjunction with the setConnection method, and requires that you provide a plain text username and password:

$cdCollection =& new DOMIT_Document();
$xmldoc->setConnection('http://www.engageinteractive.com', '/', '955');
$xmldoc->setAuthorization('johnheinstein', 'mypassword');

//call loadXML method as usual
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);

2.4. setProxyConnection: Retrieving XML data through a proxy server

An HTTP proxy is a server that acts as an intermediary between an HTTP client (a user's browser) and the Internet. It is used to enforce security, administrative control, and caching services. If you are behind a firewall, for instance, and must connect to a proxy server to access web based resources, then the setProxyConnection method will allow you to access such data.

The setProxyConnection method works inn exactly the same way as setConnection:

$cdCollection =& new DOMIT_Document();
$xmldoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060');

//call loadXML method as usual
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);

2.5. setProxyAuthorization: Using basic HTTP authorization with your proxy

The setProxyAuthorization is called in exactly the same way as setAuthorization. Just provide a valid user name and password:

$cdCollection =& new DOMIT_Document();
$xmldoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060');
$xmldoc->setProxyAuthorization('johnheinstein', 'mypassword');

//call loadXML method as usual
$success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);

2.6. preserveWhiteSpace

By default, when loading an XML document, DOMIT! removes what it considers insignificant whitespace -- such as the tabs between XML tags that are used for formatting purposes only.

Whitespace can be retained, however, if the following is called prior to loading or parsing:

$cdCollection->preserveWhitespace(true);

2.7. appendEntityTranslationTable

When DOMIT! parses or loads an XML Document, often entities are present which must be transformed into their corresponding character representations. Generally it is the responsibility of the DOCTYPE declaration to delineate these conversions.

However, DOMIT! is a non-validating parser, and is unaware of constraints placed on a document by the DOCTYPE.

The appendEntityTranslationTable method is an alternate way of specifying character equivalents of entities.

It takes a single parameter -- an associative array of entities mapped to their equivalent characters. For example, if one wanted to instruct DOMIT! to convert all &copy; entities into © :

//create translation table
$myTranslationTable = array('&copy;' => '©');

//pass table to document
$cdCollection->appendEntityTranslationTable($myTranslationTable);

3. Error Handling During and After Loading an XML Document

When DOMIT! parses XML from a string or loads XML from a file, several methods can be used to handle non-conformant XML and retrieve error codes.

DOMIT! also allows you to set a custom error handler for runtime XML processing errors.

3.1. resolveErrors

If the resolveErrors method is called, DOMIT! will attempt to locate and fix any problems with improperly formatted XML code. The method must be called before parsing begins; just pass it a value of true:

$cdCollection =& new DOMIT_Document();
$cdCollection->resolveErrors(true);
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

Note that resolveErrors may have an impact on speed, and should be used judiciously.

Currently, resolveErrors only searches for and replaces ampersands that have not been encoded as &amp;

3.2. getErrorCode and getErrorString

If loadXML or parseXML return false, an error has occurred in processing. The methods getErrorCode and getErrorString can be used to diagnose where the problem lies.

getErrorCode returns a numerical description of the error, and getErrorString returns a textual description of the error. For example:

$cdCollection =& new DOMIT_Document();
$cdCollection->resolveErrors(true);
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //process XML
}
else {
  //an error has occurred; echo to browser
  echo "Error code: " . $cdCollection->getErrorCode();
  echo "\n<br />";
  echo "Error string: " . $cdCollection->getErrorString();
}

3.3. DOMIT_DOMException::setErrorHandler

If you would like to set a custom error handler for DOMIT! to handle runtime XML processing errors, you can use a static method of the DOMIT_DOMException class: setErrorHandler.

It takes a single parameter -- the method to handle the error.

The custom errorhandler method must have the following method signature...

function myCustomErrorHandler($errorNum, $errorString)

...where $errorNum is an integer signifying the number of the error, and $errorString is a string giving a description of the error.

For example, if you wrote a function to handle your DOMIT! errors that looked like this:

function myErrorHandler($errorNum, $errorString) {
  echo "The error number is " . $errorNum . " and " the error string is " . $errorString;
}

You could invoke it like this:

DOMIT_DOMException::setErrorHandler("myErrorHandler");

If the myErrorHandler function was a method of a class named ErrorHandlers rather than a standalone function, you could invoke setErrorHandler like this:

DOMIT_DOMException::setErrorHandler(array("ErrorHandlers", "myErrorHandler"));

3.4. DOMIT_DOMException::setErrorMode

The DOMIT_DOMException::setErrorMode method allows you to define the behavior of DOMIT! when an exception occurs. It takes a single parameter -- an integer or interger constant representing the error mode:

  • DOMIT_ONERROR_CONTINUE (1) - specifies that DOMIT! should continue processing after an exception occurs. This is the default behavior.

  • DOMIT_ONERROR_DIE (2) - specifies that DOMIT! should die and display the error message after an exception occurs.

For example:

$cdCollection =& new DOMIT_Document();

//sets DOMIT! to die on an exception
DOMIT_DOMException::setErrorMode(DOMIT_ONERROR_DIE);

3.5. DOMIT_DOMException::setErrorLog

The DOMIT_DOMException::setErrorLog method allows you to specify a file to which error messages are logged and timestamped. This is a useful feature for debugging XML parsing problems.

It takes two parameters:

  • a boolean specifying whether logging should be turned on (true) or off (false)

  • a string containing the absolute or relative path of the error log file.

The following example specifies that errors are to be logged to the file 'errorLog.txt':

$cdCollection =& new DOMIT_Document();

//specifies that error logging is to be enabled and the error log filename
DOMIT_DOMException::setErrorLog(true, 'errorLog.txt');

Chapter 5. Traversing a Document and Extracting Data

Once a DOMIT_Document has been populated, you can use the standard DOM methods to extract and manipulate data in the XML tree. The following chapter illustrates how this can be done.

1. The Document Element Node

You can acquire a reference to the document element node -- the root element in a DOM document -- using the documentElement keyword.

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //gets a reference to the root element of the cd collection
  $myDocumentElement =& $cdCollection->documentElement;
}

In the cd library example, the document element node is the node <cdlibrary> .

Note: Always remember to use the reference (&) operator in PHP4, or you will be returned a shallow copy of the childNodes array. Even if you are using PHP5, it is recommended for the sake of portability to other web servers that you use an ampersand anyway.

2. Displaying a Node as Text

A text representation of a node and its contents can be displayed using the toString and toNormalizedString methods. The expandEmptyElementTags method can be used to further tweak your output.

2.1. toString

Take the document element node of the cd library example above. Once a reference to the <cdlibrary> node has been obtained using the documentElement keyword, we can see what it contains:

$myDocumentElement =& $cdCollection->documentElement;
echo $myDocumentElement->toString(true);

The following string will be echoed to the browser window:

<cdlibrary><cd discid="bb0c3c0c"><name>Robbie Fulks</name><title>Couples in Trouble</title></cd><cd discid="9b0ce70c"><name>Richard Thompson</name><title>Mock Tudor</title></cd><cd discid="cf11720f"><name>Keller Williams</name><title>Laugh</title></cd></cdlibrary>

The first parameter of toString , if set to true, converts special HTML characters into their encoded version (i.e. & into &amp;) so that they will display properly in a browser.

If you would like unconverted raw text to be output (for instance, when echoing to a command line interface) substitute a value of false:

echo $myDocumentElement->toString(false);

2.2. toNormalizedString

One drawback of the toString output is that it is not particularly readable, since all text of the node is compressed into one line. The toNormalizedString method will output text that is much more nicely formatted:

$myDocumentElement =& $cdCollection->documentElement;
echo $myDocumentElement->toNormalizedString(true);

The following string will be echoed to the browser window:

<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

As with the toString method, passing a value of false into toNormalizedString outputs text that is not formatted for HTML display.

2.3. expandEmptyElementTags

When outputting XML using toString or toNormalized string, by default DOMIT! represents empty elements using the abbreviated convention:

<anEmptyElement />

If you prefer the tags to be expanded instead, use the expandEmptyElementTags method:

$xmldoc->expandEmptyElementTags(true);

When using DOMIT! to render XHTML documents, often it is necessary to leave some tags unexpanded, such as the <br /> tag. The expandEmptyElementTags method allows you to pass in an array of exceptions to the expansion rule:

//create array of exceptions to the empty element expansion rule
$expansionExceptions = array('br', 'hr');

//invoke expansion rule, passing in array of exceptions as second parameter
$xmldoc->expandEmptyElementTags(true, $expansionExceptions);

This might result in output that looked like this:

<html>
  <body>
    <p>This is a test</p>
    <p></p>
    <br />
  </body>
</html>

3. Obtaining Node Type, Name, and Value

In an earlier section, we learned that each node in a DOM document has three properties -- node type, node name, and node value -- that allows you to distinguish between it and other nodes.

These properties are accessible in DOMIT! with the nodeType, nodeName, and nodeValue keywords.

To echo out these properties for the document element of the cdlibrary example, for instance, you would do this:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //gets a reference to the root element of the cd collection
  $myDocumentElement =& $cdCollection->documentElement;

  //echo out node name
  echo "Node name: " . $myDocumentElement->nodeName;
  echo "\n<br />";

  //echo out node type
  echo "Node type: " . $myDocumentElement->nodeType;
  echo "\n<br />";

  //echo out node value
  echo "Node value: " . $myDocumentElement->nodeValue;
  echo "\n<br />";
}

The above example would display:

cdlibrary
1
 

Note that the last line is blank because the node value for an element is null.

4. Traversing a DOM Tree

You know how to:

  • instantiate a DOMIT! document

  • populate a DOMIT! document using the loadXML or parseXML methods

  • obtain a reference to the document element

  • print the contents of a node, and

  • display the three basic node properties

We will now learn how to access other parts of a document using such DOM constructs as child nodes, parent nodes, and next and previous siblings.

4.1. The childNodes array, hasChildNodes, and childCount

As explained previously, each node in a DOM Document has a list of references to the nodes contained directly beneath it in the tree: its Child Nodes.

In DOMIT!, the child nodes exist as a standard PHP array named childNodes.

To grab a reference to the childNodes array of a node, use the following syntax:

//get a reference to the childNodes collection of the document element
$myChildNodes =& $cdCollection->documentElement->childNodes;

Note: When returning areference to the childNodes array in PHP4, always remember to use the reference (&) operator, or you will be returned a shallow copy.

It is good practice, prior to grabbing a reference to the childNodes array, to use the hasChildNodes method to check if any child nodes exist:

//ensure that there are childNodes before bothering to work with the childNodes array
if ($cdCollection->documentElement->hasChildNodes()) {
  $myChildNodes =& $cdCollection->documentElement->childNodes;
}

The number of child nodes is stored in the childCount property. You can use this value to traverse the childNodes array and access its individual nodes:

//ensure that there are childNodes before bothering to work with the childNodes array
if ($cdCollection->documentElement->hasChildNodes()) {

  //get a reference to the childNodes collection of the document element
  $myChildNodes =& $cdCollection->documentElement->childNodes;

  //get the total number of childNodes for the document element
  $numChildren =& $cdCollection->documentElement->childCount;

  //iterate through the collection
  for ($i = 0; $i < $numChildren; $i++) {

    //get a reference to the i childNode
    $currentNode =& myChildNodes[$i];
    
    //echo out the node to browser
    echo ("Node $i contents are: \n<br />" . 
      $currentNode->toNormalizedString(true) . "\n<br />\n<br />");
  }
}

The above example will return:

Node 1 contents are:
<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

Node 2 contents are:
<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>

Node 3 contents are:
<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

4.2. firstChild

The childNodes array is not the only means of accessing the children of a node.

The firstChild property of a node returns a reference to a node's first child node:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first child node of document element
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //echo out the node to browser
  echo ("The contents of the first child node are: \n<br />" . 
      $firstChildNode->toNormalizedString(true));
  }
}

The above example will return:

The contents of the first child node are:
<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

Note: If there are no child nodes present, a value of null is returned.

4.3. lastChild

The lastChild property of a node returns a reference to a node's last child node:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to last child node
  $lastChildNode =& $cdCollection->documentElement->lastChild;

  //echo out the node to browser
  echo ("The contents of the last child node are: \n<br />" . 
      $lastChildNode->toNormalizedString(true));
  }
}

The above example will return:

The contents of the last child node are:
<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

If there are no child nodes present, a value of null is returned.

4.4. nextSibling

Nodes that occupy the same level of a DOM tree are called siblings. The DOM conceives of these nodes as being chained in a sequence, with each node aware of the node immediately preceding and immediately following it.

The nextSibling property of a node returns a reference to the node prior to it in the sibling chain.

In the cdlibrary example, the next sibling of the Robbie Fulks <cd> node is the Richard Thompson <cd> node. One would access it like this:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //get a reference to the next sibling (the Richard Thompson cd)
  $nextSiblingNode =& $firstChildNode->nextSibling;

  //echo out the node to browser
  echo ("The contents of the next sibling are: \n<br />" . 
      $nextSiblingNode->toNormalizedString(true));
  }
}

The above example will return:

The contents of the next sibling are:
<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>

If there are no next sibling nodes present, a value of null is returned.

4.5. previousSibling

The lastSibling property of a node returns a reference to the node after it in the sibling chain.

In the cdlibrary example, the previous sibling of the Keller Williams <cd> node is the Richard Thompson <cd> node. One would access it like this:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to last cd node (the Keller Williams cd)
  $lastChildNode =& $cdCollection->documentElement->lastChild;

  //get a reference to the previous sibling (the Richard Thompson cd)
  $previousSiblingNode =& $lastChildNode->previousSibling;

  //echo out the node to browser
  echo ("The contents of the previous sibling are: \n<br />" . 
      $previousSiblingNode->toNormalizedString(true));
  }
}

The above example will return:

The contents of the previous sibling are:
<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>

If there are no previous sibling nodes present, a value of null is returned.

4.6. parentNode

As the name implies, the parentNode property of a node returns a reference to the node one level above it in the DOM tree.

In the cdlibrary example, the parent node of the Robbie Fulks <cd> node is the document element <cdlibrary> node. One would access it like this:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //get a reference to the parent (cdlibrary) node
  $myParentNode =& $firstChildNode->parentNode;

  //echo out the node to browser
  echo ("The contents of the parent node of the Robbie Fulks cd node are: \n<br />" . 
      $myParentNode->toNormalizedString(true));
  }
}

The above example will return:

The contents of the parent node of the Robbie Fulks cd node are:
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

If there is no parent node present, a value of null is returned. Note that only the document element node will have no parent.

4.7. ownerDocument

Each node in a DOM document -- with the the exception of attribute nodes -- is considered to be "owned" by that document.

Use the ownerDocument property of a node to obtain a reference to the DOMIT! document:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //get a reference to the DOMIT document
  $myOwnerDocument =& $firstChildNode->ownerDocument;
}

5. Extracting Character Data

Text nodes, CDATA Section nodes, and comment nodes belong to what is defined by the DOM as the CharacterData interface, which specifies a number of methods for obtaining the textual data. The following section describes some of these methods.

5.1. nodeValue

The easiest way of getting the data from a text node, CDATA Section nodes, or comment node is through its nodeValue property.

Note: A common error that many DOM newbies make is to confuse a text node with the element node that contains it. It is important to realize that a text node is always the child of the containing element.

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //get a reference to the <name> element of the Robbie Fulks cd
  $nameElement =& $cdCollection->documentElement->childNodes[0]->firstChild;

  //get a reference to the text node
  //(this step has been broken into multiple steps to emphasize that
  //a text node must be distinguished from its containing element!)
  $nameTextNode =& $nameElement->firstChild;

  //echo out the data in the text node 
  echo $nameTextNode->nodeName;
}

The above example returns:

Robbie Fulks

If you prefer, you can condense the above steps into a single line:

$myText  = $cdCollection->documentElement->childNodes[0]->firstChild->firstChild->nodeName;

5.2. getData

The getData method is a wrapper for the nodeValue keyword and functions in exactly the same way:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //get a reference to the <name> element of the Robbie Fulks cd
  $nameElement =& $cdCollection->documentElement->childNodes[0]->firstChild;

  //get a reference to the text node
  //(this step has been broken into multiple steps to emphasize that
  //a text node must be distinguished from its containing element!)
  $nameTextNode =& $nameElement->firstChild;

  //echo out the data in the text node 
  echo $nameTextNode->getData();
}

5.3. getText

In most cases, the getText method functions identically to nodeValue and getData. You can simply substitute the word getText for the word getData in the previous example and the results will be the same.

However, getText can also be called on an element. In this case, the concatenated text of all children beneath the element is returned. For instance:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //get a reference to the Robbie Fulks <cd> element
  $cdElement =& $cdCollection->documentElement->childNodes[0];

  //get ALL text beneath the cd element ("name" text + "title" text)
  $childText = $cdElement->getText();

  //echo out the concatenated data 
  echo childText;
}

The above example returns:

Robbie FulksCouples in Trouble

5.4. getLength

The getLength method indicates how many characters exist in a character data node:

$numCharacters = $myTexNode->getLength();

5.5. substringData

The substringData method returns a specified subset of characters from a character data node.

It takes two parameters:

  • offset: an integer specifying the starting character of the substring

  • count: an integer specifying how many characters from the offset should be included in the substring

To extract the first name from the "Robbie Fulks" text node, for example, one would do this:

$firstName = $rfTextNode->substringData(0,6);

6. Accessing Attributes

In a DOM document, attributes are accessed, by name, from their containing element. The DOMIT! methods hasAttribute and getAttribute can be used to extract attribute data.

6.1. hasAttribute

To determine whether an element contains a particular attribute, you can use the hasAttribute method. It takes a single string parameter -- the name of the attribute -- and returns either true or false:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //determine whether it has an attribute named "discid" 
  if ($firstChildNode->hasAttribute("discid")) {
    echo ("I DO have a discid attribute");
  }
  else {
    echo ("I DO NOT have a discid attribute");
  }
}

6.2. hasAttributes

The hasAttributes method returns true if an element contains at least one attribute.

if ($someNode->hasAttributes()) {
    echo ("I have at least one attribute");
  }
  else {
    echo ("I have no attributes");
  }

6.3. getAttribute

To obtain the value of a named attribute, use the getAttribute method. As with the hasAttribute method, you pass in the attribute name:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //determine whether it has an attribute named "discid" 
  if ($firstChildNode->hasAttribute("discid")) {

    //obtain the value of the discid attribute
    $attrValue = $firstChildNode->getAttribute("discid);

    //echo the value out to the browser
    echo ("Attribute value: " . $attrValue);
  }
  else {
    echo ("I DO NOT have a discid attribute");
  }
}

The above example returns:

bb0c3c0c

Note: If the attribute does not exist, an empty string (i.e., "") is returned.

6.4. getAttributeNode and getValue

The getAttribute method returns the value of an attribute node. If you would like to obtain a reference to the node itself, use the getAttributeNode method.

To obtain the value of an attribute node, use either the getValue method:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //determine whether it has an attribute named "discid" 
  if ($firstChildNode->hasAttribute("discid")) {

    //obtain a reference to the discid attribute node (don't forget the ampersand!)
    $attrNode =& $firstChildNode->getAttributeNode("discid);

    //echo the value out to the browser
    echo ("The value of the discid attribute is: \n<br />" . 
      $attrNode->getValue());
  }
  else {
    echo ("I DO NOT have a discid attribute");
  }
}

The above example returns:

The value of the discid attribute is:
bb0c3c0c

6.5. The attributes Keyword and Named Node Maps

An attribute list is defined by the DOM specification as a Named Node Map. This is a type of node collection that allows you to access its members either by name or by index.

Although the attribute specific methods are in most cases sufficient, there may be times when you do not know in advance the names of an elements attributes. Using the named node map methods, you can query ther list to find out this data.

6.5.1. The attributes Keyword

To obtain a reference to the attributes list /named node map of an element, use the attributes keyword:

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //get a reference to the attributes list / named node map (don't forget the ampersand!)
  $attrList =& $firstChildNode->attributes;
}

6.5.2. getLength, item, and getName

The getLength method of a named node map returns an integer indicating how many members belong to the attribute list.

The item method of a named node map allows you to access a member by its numerical index (which is 0-based). In combination with the getLength method, you can set up a loop through the members of an attribute list.

The getName method will tell you the name of the node.

if ($cdCollection->documentElement->hasChildNodes()) {

  //get reference to first cd node (the Robbie Fulks cd)
  $firstChildNode =& $cdCollection->documentElement->firstChild;

  //get a reference to the attributes list / named node map (don't forget the ampersand!)
  $attrList =& $firstChildNode->attributes;

  //determine the number of members in the attribute list
  $numAttributes = $attrList->getLength();

  //iterate through the list
  for ($i = 0; $i < $numAttributes; $i++) {
    //get a reference to the attribute node at index i (don't forget the ampersand!)   
    $currAttr =& $attrList->item(i);

    //echo out the name and value of the attribute
    echo "The attribute at index " . i . " is named: " . $currAttr->getName();
    echo "\n<br /> Its value is: " . $currAttr->getValue(); 
  }
}

The above example returns:

The attribute at index 1 is named: discid
Its value is: bb0c3c0c

7. Accessing the XML Prolog

The XML Prolog is a term referring to the XML Declaration and the Document Type Declaration.

7.1. getXMLDeclaration

The XML declaration can be acquired with the getXMLDeclaration method:

$myXMLDecl =& $xmldoc->getXMLDeclaration();

A reference to a processing instruction node is returned.

7.2. Accessing the Document Type Declaration

The Document Type Declaration can be acquired with the getDocType method:

$myDTD =& $xmldoc->getDocType();

A reference to a processing instruction node is returned.

Chapter 6. Creating and Modifying a DOM Document

The major strength of the Document Object Model is the ease with which the data in an XML document can be modified. The following chapter delineates how to use DOMIT! for creating, appending, inserting, replacing, removing, and altering XML data.

1. Creating Nodes

Creating new XML nodes is accomplished using a set of DOMIT_Document factory methods. For the next subsections, we will assume that a new DOMIT_Document has already been created as follows:

//include DOMIT! codebase
require_once('xml_domit_include.php');

//instantiate a new DOMIT! document
$xmldoc =& new DOMIT_Document(); 

1.1. createElement

To create a new DOM element, use the createElement method.

The createElement method takes a single parameter -- the name of the element.

$newElement =& $xmldoc->createElement("cdlibrary");

Note: Don't forget to include the ampersand for backwards compatibility with PHP4!

1.2. createTextNode

To create a new DOM text node, use the createTextNode method.

The createTextNode method takes a single parameter -- the text of the node.

$myText = 'Here is some dummy text';

$newTextNode =& $xmldoc->createTextNode($myText);

1.3. createCDATASection

To create a new DOM CDATA Section, use the createCDATASection method.

The createCDATASection method takes a single parameter -- the text of the CDATA Section.

$myText = 'Here are some illegal XML characters: & <';

$newCDATASection =& $xmldoc->createCDATASection($myText);

1.4. createAttribute

To create a new DOM attribute, use the createAttribute method.

The createAttribute method takes two parameters:

  • the name of the attribute

  • the value of the attribute

$newAttribute =& $xmldoc->createAttribute("discid", "bb0c3c0c");

1.5. createComment

To create a new DOM comment, use the createComment method.

The createComment method takes a single parameter -- the text of the comment.

$myCommentText = 'This is a comment';

$newCommentNode =& $xmldoc->createComment($myCommentText);

1.6. createProcessingInstruction

To create a new DOM processing instruction, use the createProcessingInstruction method.

The createProcessingInstruction method takes a two parameters -- the text of the target and the text of the data.

//create target and data
$myTarget = 'xml';
$myData = 'version="1.0"';

//create processing instruction
$newProcessingInstructionNode =& $xmldoc->createProcessingInstruction($myTarget, $myData);

2. Appending Nodes

Appending a node in the DOM means adding a new child node to the end of a node's child nodes list.

You can use the appendChild method to append a node (and its children, if any exist) to a DOM Document or an element node.

The following example creates a <cdlibrary> element and appends it to a new DOMIT_Document:

//include DOMIT! codebase
require_once('xml_domit_include.php');

//instantiate a new DOMIT! document
$xmldoc =& new DOMIT_Document(); 

//create cdlibrary node
$newNode =& $xmldoc->createElement('cdlibrary');

//append cdlibrary node to new DOMIT_Document
$xmldoc->appendChild($newNode);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<cdlibrary></cdlibrary>

3. Setting the Document Element

In the previous section, when the <cdlibrary> element was appended to the empty DOM document, it became the document element.

The setDocumentElement method is another way of achieving the same result. For example:

//include DOMIT! codebase
require_once('xml_domit_include.php');

//instantiate a new DOMIT! document
$xmldoc =& new DOMIT_Document(); 

//create cdlibrary node
$newNode =& $xmldoc->createElement('cdlibrary');

//append cdlibrary node to new DOMIT_Document
$xmldoc->setDocumentElement($newNode);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<cdlibrary></cdlibrary>

setDocumentElement will overwrite an existing document element.

4. Setting Attributes

The setAttribute and setAttributeNode methods are used to either add an attribute to an element, or change the value of an existing attribute. They are methods of element nodes only.

4.1. setAttribute

The setAttribute method takes two parameters:

  • the name of the attribute to be added

  • the value of the attribute to be appended

The following example adds a discid attribute to a <cd> element:

//create cd element
$newNode =& $xmldoc->createElement('cd');

//add a discid attribute
$newNode->setAttribute('discid', 'bb0c3c0c');

//echo to browser
echo $newNode->toNormalizedString(true);

The result is:

<cd discid="bb0c3c0c"></cd>

4.2. setAttributeNode

The setAttribute method also adds an attribute to an element. It takes a single parameter -- an attribute node:

//create cd element
$newNode =& $xmldoc->createElement('cd');

//create a discid attribute node
$newAttr =& $xmldoc->createAttribute('discid', 'bb0c3c0c');

//add the attribute node to the element
$newNode->setAttributeNode($newAttr);

//echo to browser
echo $newNode->toNormalizedString(true);

The result is:

<cd discid="bb0c3c0c"></cd>

5. Creating the cdlibrary XML Using DOMIT!

We now have sufficient tools to create the cdlibrary example from scratch, using only DOMIT!

//include DOMIT! codebase
require_once('xml_domit_include.php');

//instantiate a new DOMIT! document
$xmldoc =& new DOMIT_Document(); 

//create XML declaration
$xmlDecl =& $xmldoc->createProcessingInstruction('xml', 'version="1.0"');

//append XML declaration to new DOMIT_Document
$xmldoc->appendChild($xmlDecl);

//create cdlibrary node
$rootElement =& $xmldoc->createElement('cdlibrary');

//append cdlibrary node to new DOMIT_Document
$xmldoc->appendChild($rootElement);

//CREATE FIRST CD ELEMENT AND CHILDREN
//create cd element
$cdElement_1 =& $xmldoc->createElement('cd');

//add discid attribute
$cdElement_1->setAttribute('discid', 'bb0c3c0c');

//create name element
$nameElement =& $xmldoc->createElement('name');

//create and append text node to name element
$nameElement->appendChild($xmldoc->createTextNode('Robbie Fulks'));

//append name element to cd element
$cdElement_1->appendChild($nameElement);

//create title element
$titleElement =& $xmldoc->createElement('title');

//create and append text node to title element
$titleElement->appendChild($xmldoc->createTextNode('Couples in Trouble'));

//append title element to cd element
$cdElement_1->appendChild($titleElement);


//CREATE SECOND CD ELEMENT AND CHILDREN
//create cd element
$cdElement_2 =& $xmldoc->createElement('cd');

//add discid attribute
$cdElement_2->setAttribute('discid', '9b0ce70c');

//create name element
$nameElement =& $xmldoc->createElement('name');

//create and append text node to name element
$nameElement->appendChild($xmldoc->createTextNode('Richard Thompson'));

//append name element to cd element
$cdElement_2->appendChild($nameElement);

//create title element
$titleElement =& $xmldoc->createElement('title');

//create and append text node to title element
$titleElement->appendChild($xmldoc->createTextNode('Mock Tudor'));

//append title element to cd element
$cdElement_2->appendChild($titleElement);


//CREATE THIRD CD ELEMENT AND CHILDREN
//create cd element
$cdElement_3 =& $xmldoc->createElement('cd');

//add discid attribute
$cdElement_3->setAttribute('discid', 'cf11720f');

//create name element
$nameElement =& $xmldoc->createElement('name');

//create and append text node to name element
$nameElement->appendChild($xmldoc->createTextNode('Keller Williams'));

//append name element to cd element
$cdElement_3->appendChild($nameElement);

//create title element
$titleElement =& $xmldoc->createElement('title');

//create and append text node to title element
$titleElement->appendChild($xmldoc->createTextNode('Laugh'));

//append title element to cd element
$cdElement_3->appendChild($titleElement);

//APPEND CD ELEMENTS TO CDLIBARY ELEMENT
$rootElement->appendChild($cdElement_1);
$rootElement->appendChild($cdElement_2);
$rootElement->appendChild($cdElement_3);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

6. Inserting Nodes

If you need to add a child node somewhere other than the end of the child nodes list, you can use the insertBefore method.

insertBefore takes two parameters:

  • a reference to the node that is to be added

  • a reference to an existing child node, before which the insertion will occur

If, continuing with the cdlibrary document from the previous example, we wished to insert a comment as the first child node of the <cdlibrary> element, insertBefore could be used:

//create a comment
$myComment =& $xmldoc->createComment('Not many cds left after I got robbed');

//insert the comment as the first child of the cdlibrary element
$rootElement->insertBefore($myComment, $rootElement->childNodes[0]);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <!--Not many cds left after I got robbed-->
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

7. Replacing Nodes

Let's say that I traded my Robbie Fulks cd for a Charlie Hunter cd named "Songs From the Analog Playground", and I want to replace the old XML with a new cd node.

The replaceChild method can be used to do this. It takes two parameters:

  • a reference to the new node to be added

  • a reference to the node that is to be replaced

//CREATE NEW CHARLIE HUNTER CD ELEMENT AND CHILDREN
//create cd element
$cdElement_new =& $xmldoc->createElement('cd');

//add discid attribute
$cdElement_new->setAttribute('discid', 'a30e4c0d');

//create name element
$nameElement =& $xmldoc->createElement('name');

//create and append text node to name element
$nameElement->appendChild($xmldoc->createTextNode('Charlie Hunter'));

//append name element to cd element
$cdElement_new->appendChild($nameElement);

//create title element
$titleElement =& $xmldoc->createElement('title');

//create and append text node to title element
$titleElement->appendChild($xmldoc->createTextNode('Songs From the Analog Playground'));

//append title element to cd element
$cdElement_new->appendChild($titleElement);

//REPLACE ROBIBIE FULKS CD NODE WITH CHARLIE HUNTER CD NODE
//(remember a comment has been added, so Robbie is the second child node)
$rootElement->replaceChild($cdElement_new, $rootElement->childNodes[1]);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <!--Not many cds left after I got robbed-->
  <cd discid="a30e4c0d">
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

8. Removing Nodes

The removeChild method allows you to delete a node (and its children) from a DOM document. It takes a single parameter -- a reference to the node to be removed.

The following example removes the comment from the cdlibrary XML:

$rootElement->removeChild($rootElement->firstChild);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="a30e4c0d">
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

9. Removing Attributes

An attribute can be deleted with either the removeAttribute or removeAttributeNode method.

9.1. removeAttribute

An attribute can be removed with the removeAttribute method. It takes a single parameter -- the name of the attribute to be removed.

The following example removes the discid attribute from the Charlie Hunter <cd> element:

$rootElement->firstChild->removeAttribute('discid');

//echo to browser
echo $rootElement->toNormalizedString(true);

The result is:

<cdlibrary>
  <cd>
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

9.2. removeAttributeNode

An attribute can also be removed with the removeAttributeNode method. It takes a single parameter -- a reference to the attribute to be removed.

The following example removes the discid attribute from the Charlie Hunter <cd> element:

//get reference to attribute to be removed
$attrToRemove =& $rootElement->firstChild->getAttributeNode('discid');

//remove attribute
$rootElement->firstChild->removeAttributeNode($attrToRemove);

//echo to browser
echo $rootElement->toNormalizedString(true);

The result is:

<cdlibrary>
  <cd>
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

10. Setting Character Data

There are a variety of methods available for working with character data nodes.

10.1. setText

The setText method allows you to modify the text of an existing text node, CDATA Section, or comment.

To change the title of the Keller Williams cd, for example, you would do this:

//get reference to title text node of Keller Williams cd 
$titleTextNode =& $rootElement->childNodes[2]->childNodes[1]->firstChild;

//modify title
$titleTextNode->setText('Loop');

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="a30e4c0d">
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Loop</title>
  </cd>
</cdlibrary>

10.1.1. setText When Called from an Element

If setText is called from an element instead of a text node, DOMIT! will check if the element has a child text node.

If the element has a child text node, the text of that node will be set to the value specified in the setText parameter.

If the element does not have a child text node, a new text node will be created, appended to the element, and its node value set to the value specified in the setText parameter. For instance:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//call setText on the element 
//(note that no child text node exists at this point, but one will be created)
$someElement->setText('Some sample text');

//echo to browser
echo $someElement->toNormalizedString(true);

The result is:

<someElement>Some sample text</someElement>

10.2. splitText

The splitText method is accessible only from a text node. It allows you to split a text node into two text nodes, at a specified offset point. Both text nodes will be retained in the DOM tree as siblings.

setText takes a single integer parameter -- the character index at which the node is to be split.

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//echo childCount to browser
echo '$someElement has ' . $someElement->childCount . ' child nodes.';

/add a text node to the element 
$someElement->firstChild->splitText(5);

//echo childCount to browser
echo "\n<br />";
echo 'After calling splitText, $someElement now has ' . $someElement->childCount . ' child nodes.';

The result is:

$someElement has 1 child nodes.
After calling splitText, $someElement now has 2 child nodes.

10.3. normalize

The normalize method performs the opposite of the splitText method: it collapses adjacent text nodes into a single text node.

normalize can be called from any element or the DOM document itself, and is called recursively on all nodes below the calling node.

The following example splits a text node using splitNode, then uses normalize to reverse the operation:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//echo childCount to browser
echo '$someElement has ' . $someElement->childCount . ' child nodes.';

/add a text node to the element 
$someElement->firstChild->splitText(5);

//echo childCount to browser
echo "\n<br />";
echo 'After calling splitText, $someElement now has ' . $someElement->childCount . ' child nodes.';

//call normalize on element to reverse splitText
$someElement->normalize();

//echo childCount to browser
echo "\n<br />";
echo 'After calling normalize, $someElement now has ' . $someElement->childCount . ' child nodes.';

The result is:

$someElement has 1 child nodes.
After calling splitText, $someElement now has 2 child nodes.
After calling normalize, $someElement now has 1 child nodes.

10.4. appendData

The appendData method allows you to append text to a text node, CDATA Section, or comment node. For example:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//append more text
$someElement->firstChild->appendData(' plus more text.');

//echo to browser
echo $someElement->toNormalizedString(true);

The result is:

<someElement>Some sample text plus more text.</someElement>

10.5. insertData

The insertData method allows you to insert text into a text node, CDATA Section, or comment node, as a specified offset.

It takes two parameters: an integer indicating the insertion pont, and a string comprising the text to be inserted. For example:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//insert some text
$someElement->firstChild->insertData(5, ' more');

//echo to browser
echo $someElement->toNormalizedString(true);

The result is:

<someElement>Some more sample text</someElement>

10.6. replaceData

The replaceData method allows you to overwrite a substring of text in a text node, CDATA Section, or comment node.

It takes three parameters: an integer indicating the insertion pont, an integer specifying the number of characters from the insertion point to overwrite, and a string comprising the replacement text. For example:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//replace some text
$someElement->firstChild->replaceData(0, 4, 'A bit of');

//echo to browser
echo $someElement->toNormalizedString(true);

The result is:

<someElement>A bit of sample text</someElement>

10.7. deleteData

The deleteData method allows you to delete a substring of text in a text node, CDATA Section, or comment node.

It takes two parameters: an integer indicating the insertion pont, and an integer specifying the number of characters from the insertion point to delete. For example:

//create a new element
$someElement =& $xmldoc->createElement('someElement');

//add a text node to the element 
$someElement->setText('Some sample text');

//delete some text
$someElement->firstChild->deleteData(6, 7);

//echo to browser
echo $someElement->toNormalizedString(true);

The result is:

<someElement>Some text</someElement>

Chapter 7. Saving a DOM Document

After modifying an XML document, you generally need to save it to the filesystem. This can be achieved using the saveXML method.

saveXML takes two parameters:

  • the file path to save the document

  • a boolean specifying whether toNormalizedString formatting should be applied to the saved XML

To save the cdlibrary XML you would do this:

$xmldoc->saveXML('/xml/cdcollection.xml', true)

Chapter 8. Miscellaneous DOM Features

The are a number of additional DOM methods and constructs that have not yet been covered in this tutorial. The following chapter illustrates these.

1. cloneNode

The cloneNode method allows you to make an copy of a node and its children. All data in the cloned node will be identical to its source node, but the nodes are considered separate objects.

cloneNode takes a single parameter -- a boolean that if set to true will also clone all children of the node. The default value is true.

Any type of node in a DOM document can be cloned.

The following example clones the first <cd> element in the cdlibrary document and prints to the browser:

//get reference to first cd node
$firstCDNode =& $cdCollection->documentElement->childNodes[1]; 

//echo to browser
echo $firstCDNode->toNormalizedString(true);

//clone first cd node
$clonedCDNode =& $firstCDNode->cloneNode(true);

//echo to browser
echo "\n<br />\n<br />" . $clonedCDNode->toNormalizedString(true);

The result is:

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

2. getElementByID

The getElementByID method searches for elements with attributes of type ID, and returns an element with the specified value if one exists.

The DOM specification explains that by default, the search does not match on elements with an attribute named "ID"; rather, it is an attribute type that the method is looking for. The attribute type must either be:

  • defined in the document type declaration, i.e.,

    <!ATTLIST bar
        id ID #IMPLIED
      >
  • an attribute named id must be prefixed with the namespace xml

    <someElement xml:id="12345" />

DOMIT! is a non-validating parser, so the first option is not available. DOMIT! does, however, recognize the second option. With the following xml document, for example...

<testDocument>
  <someElement xml:id="12345">The containing element is properly formatted for getElementByID</someElement>
  <anotherElement id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement>
</testDocument>

... the getElementByID method will match only on the first child node:

//instantiate and load XML
$xmldoc =& new DOMIT_Document();
$success = $xmldoc->loadXML("testDocument.xml", true);

if ($success) {
  //search for element with an ID of "12345"
  $matchingNode =& $xmldoc->getElementByID("12345");

  //echo matching node to browser if one exists
  if ($matchingNode != null) {
    echo $matchingNode->toNormalizedString(true);
  }
}

The result is:

<someElement xml:id="12345">The containing element is properly formatted for getElementByID</someElement>

The getElementByID method returns null if no matching element is found.

2.1. getElementByID and Strict vs. Tolerant mode

Some may argue that the DOM specification for getElementByID is too rigid for practical purposes. When parsing XHTML in particular, it is common to match on ID attributes that are not defined in such a way that DOMIT! or other non-validating parsers can effectively match on elements.

Given this, DOMIT! allows you to specify a tolerant mode for getElementByID searches. By passing in a second parameter of false, DOMIT! will match on elements with attributes of "ID" and "id".

Take the following document, for example:

<testDocument>
  <anotherElement id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement>
</testDocument>

If getElementByID is called:

//instantiate and load XML
$xmldoc =& new DOMIT_Document();
$success = $xmldoc->loadXML("testDocument.xml", true);

if ($success) {
  //search for element with an ID of "12345"
  $matchingNode =& $xmldoc->getElementByID("12345");

  //echo matching node to browser if one exists
  if ($matchingNode != null) {
    echo $matchingNode->toNormalizedString(true);
  }
}

The result is:

<anotherElement xml:id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement>

3. getElementsByTagName

The getElementsByTagName method is similar to getElementByID, in that it is a method for searching a DOM document for elements which match certain criteria.

In the case of getElementByTagName, the name of the element is matched on, and there can consequently be multiple matching elements.

getElementsByTagName takes a single parameter -- the tag name of the elements to match. The search is performed recursively through the entire subtree of the calling element.

If one searched the cdlibrary XML for elements named "cd", for example, three elements would be returned:

$cdCollection =& new DOMIT_Document();
$success = $cdCollection->loadXML("/xml/cdcollection.xml");

if ($success) {
  //use getElementsByTagName to gather all elements named "cd"
  $matchingNodes =& $cdCollection->getElementsByTagName("cd");

  //if any matching nodes are found, echo to browser
  if ($matchingNodes != null) {
    echo $matchingNodes->toNormalizedString(true);
  }
}

The result is a printout of the three matched cd elements:

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>
<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>
<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

4. Using NodeLists

In the previous section, the getElementsByTagName returned a collection of matching nodes. This collection is described by the DOM specification as a Node List.

A node list is a collection of nodes accessible by numerical index. A number of methods are defined to access its members. Many of these are identical to those found in the previously discessed named node map.

4.1. getLength and item

The getLength and item methods for a node list are identical to those for a named node map. You can use them to iterate through the node list using a for loop.

Take the previous getElementsByTagName example, which returned three nodes. You can, for instance, loop through the node list and print out the discid of each CD:

//use getElementsByTagName to gather all elements named "cd"
$matchingNodes =& $cdCollection->getElementsByTagName("cd");

//if any matching nodes are found, loop through them and print out disc id
if ($matchingNodes != null) {
  
  //get total number of nodes in the list
  $total = $matchingNodes->getLength();

  //loop through node list 
  for ($i = 0; $i < $total; $i++) {

    //get current node on list
    $currNode =& $matchingNodes->item($i);

    //echo out discid
    echo $currNode->getAttribute('discid') . "\n<br />";
  }
}

The result is:

bb0c3c0c
9b0ce70c
cf11720f

4.2. appendNode and removeNode

The appendNode method allows you to add a node to the end of the node list. The removeNode method allows you to remove a node from the node list.

Both methods take a single parameter -- a reference to the node being appended or removed.

To append a node to the cd node list from the above example, you could do this:

//use getElementsByTagName to gather all elements named "cd"
$matchingNodes =& $cdCollection->getElementsByTagName("cd");

//create a new node
$newNode =& $cdCollection->createElement("someElement");

//append the node to the node list
$matchingNodes->appendNode($newNode);

//echo to browser
echo $matchingNodes->toNormalizedString(true);

The result is:

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>
<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>
<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>
<someElement />

To remove a node from the cd node list from the above example, you could do this:

//use getElementsByTagName to gather all elements named "cd"
$matchingNodes =& $cdCollection->getElementsByTagName("cd");

//remove the first node from the node list
$matchingNodes->removeNode($matchingNodes->item(0));

//echo to browser
echo $matchingNodes->toNormalizedString(true);

The result is:

<cd discid="9b0ce70c">
  <name>Richard Thompson</name>
  <title>Mock Tudor</title>
</cd>
<cd discid="cf11720f">
  <name>Keller Williams</name>
  <title>Laugh</title>
</cd>

4.3. childNodesAsNodeList

According to the DOM specification, the child nodes of an element should be kept in a node list.

However, contrary to the specification, DOMIT! uses an array rather than a node list. This is to get around a deficiency in PHP 4, in which method calls cannot be chained together as one would normally expect with an object oriented programming language.

You cannot, for instance, do this in PHP4 (although you can in PHP5)...

$myText = $xmldoc->documentElement->getChildNodes()->item(2)->getText();

...although by using an array, it is possible to burrow deeply down into a document structure without splitting your code into multiple lines:

$myText = $xmldoc->documentElement->childNodes[2]->getText();

For those who are using PHP5 and would like child nodes to be returned in node list format, the childNodesAsNodeList method can be used:

$myText = $documentElement->childNodesAsNodeList()->item(2)->getText();

5. importNode

The importNode method allows you to properly import a node into a DOM document which originated from another DOM document.

It takes two parameters:

  • the node to be imported

  • a boolean that, if true, will also import all the nodes children (this is default behavior)

Let's say we had two XML document. The first is the cd collection that we have been using throughout this tutorial. The second document contains a single cd that looks like this:

<cd discid="a30e4c0d">
  <name>Charlie Hunter</name>
  <title>Songs From the Analog Playground</title>
</cd>

If we instantiated these two XML documents, and wanted to add the contents of the cd document to the cdlibrary , we would first have to use importNode:

//instantiate and load first XML Document
$xmldoc1 =& new DOMIT_Document();
$success1 = $xmldoc->loadXML("cdCollection.xml", true);

//instantiate and load second XML Document
$xmldoc2 =& new DOMIT_Document();
$success2 = $xmldoc->loadXML("cd.xml", true);

//import contents of xmldoc2 into xmldoc1
$importedData =& $xmldoc1->importNode($xmldoc2->documentElement);

//append contents of xmldoc2 to the cdCollection node
$xmldoc1->documentElement->appendChild($importedData);

//echo to browser
echo $xmldoc1->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
  <cd discid="a30e4c0d">
    <name>Charlie Hunter</name>
    <title>Songs From the Analog Playground</title>
  </cd>
</cdlibrary>

Chapter 9. Custom DOMIT! Methods

DOMIT! includes a number of non-DOM methods for XML processing.

1. getVersion

The getVersion method returns the version number of the current install of DOMIT!

$myVersion = $xmldoc->getVersion();

2. Searching for Nodes

Although the getElementByID and getElementsByTagName methods are useful, often you need more sophisticated search options to simplify you XML code.

2.1. getElementsByPath

The getElementsByPath method allows you to search for elements in a document that match a "path"-like pattern that you provide.

The syntax is similar to an XPath query, although the range of patterns allowed by getElementsByPath is far less sophisticated than the XPath specification permits.

The pattern takes the basic form of elementName/elementName, where the forward slash represents a parent-child relationship. Either a node list, a single node, or null is returned

getElementsByPath can be called by any node. There are three basic ways that you can form a pattern:

  • An absolute path search can be performed by prefixing your pattern with the / character. This type of search will start at the level of the document element node.

  • A relative path search can be performed by omitting the / prefix from your pattern. This type of search will start at the level of the node which called getElementsByPath.

  • A variable path search can be performed by prefixing your pattern with // characters. This type of search will find all matching elements, regardless of their position in the node hierarchy.

Let's try an example of each with our cdlibrary XML:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

2.1.1. Absolute Path Search

The pattern for an absolute path search begins with a forward slash, meaning that the search will begin at the level of the document element node -- no matter what level the calling node resides.

To perform an absolute search for all <title> elements, one would do this:

//use getElementsByPath to retrieve all title elements
$myNodeList =& $cdCollection->getElementsByPath("/cdlibrary/cd/title");

//echo to browser
echo $myNodeList->toNormalizedString(true);

The result is a listing of the three found <title> nodes:

<title>Couples in Trouble</title>
<title>Mock Tudor</title>
<title>Laugh</title>

2.1.2. Relative Path Search

The pattern for a relative path search does not contain a beginning forward slash. The search will begin at the level of the calling node.

To perform an relative search for all <name> elements which are children of <cd> elements which are children of the <cdlibrary> element, one would do this:

//use getElementsByPath to retrieve all name elements which are children of 
//cd elements which are children of the cdlibrary element
$myNodeList =& $cdCollection->documentElement->getElementsByPath("cd/name");

//echo to browser
echo $myNodeList->toNormalizedString(true);

The result is a listing of the three found <name> nodes:

<name>Robbie Fulks</name>
<name>Richard Thompson</name>
<name>Keller Williams</name>

2.1.3. Variable Path Search

The pattern for a variable path search begins with a double forward slash. Each element in the document is be considered a starting point for the search.

To perform a variable search for all <title> elements in the document, one would do this:

//use getElementsByPath to retrieve all title elements in cdlibrary
$myNodeList =& $cdCollection->getElementsByPath("//title");

//echo to browser
echo $myNodeList->toNormalizedString(true);

The result is a listing of the three found <title> nodes:

<title>Couples in Trouble</title>
<title>Mock Tudor</title>
<title>Laugh</title>

2.1.4. Returning a Single Node Instead of a Node List

If you would like a single node to be returned by getElementsByPath, rather than the entire node list of matching elements, you can specify the index of the requested node by passing an integer as the second parameter of getElementsByPath.

In accordance with the XPath specification, the index that you specify is 1-based.

To return the first <cd> node of the cdlibrary example, you could do this:

//use getElementsByPath to retrieve the first cd element in cdlibrary
$myElement =& $cdCollection->getElementsByPath("/cdlibrary/cd", 1);

//echo to browser
if ($myElement != null) {
  echo $myElement->toNormalizedString(true);
}

The result is:

<cd discid="bb0c3c0c">
  <name>Robbie Fulks</name>
  <title>Couples in Trouble</title>
</cd>

2.2. getElementsByAttribute

The getElementsByAttribute method allows you to retrieve a node list of elements, each of which contain an attribute that matches the specified name and value. This is a useful improvement over the getElementByID method, since it does not require you to be bound to a narrow definition of attribute type or name.

To obtain a node list of all elements containing an attribute named 'myAttr' and a value of '3', for example, you would do this:

//get node list of elements containing myAttr="3"
$myNodeList =& $xmldoc->getElementsByAttribute('myAttr', '3');

There is a third parameter available for getElementsByAttribute, a boolean which if set to true will return the first matching element rather than an entire node list of elements:

//get first matching elements containing myAttr="3"
$myElement =& $xmldoc->getElementsByAttribute('myAttr', '3', true);

2.3. getNodesByNodeType

The getNodesByNodeType method allows you to search the document tree for nodes of a specific node type.

You can specify a node type using one of the following DOMIT! constants:

  • DOMIT_ELEMENT_NODE (an integer value of 1)

  • DOMIT_TEXT_NODE (an integer value of 3)

  • DOMIT_CDATA_SECTION_NODE (an integer value of 4)

  • DOMIT_PROCESSING_INSTRUCTION_NODE (an integer value of 7)

  • DOMIT_COMMENT_NODE (an integer value of 8)

  • DOMIT_DOCUMENT_NODE (an integer value of 9)

You must also pass in as the second parameter a context node - a node from which the search should start.

The following example returns a node list of all text nodes in the cdlibrary example:

//find all text nodes in cdlibrary
$myTextNodeList =& $cdCollection->getNodesByNodeType(DOMIT_TEXT_NODE, $cdCollection);

//echo to browser
echo $myTextNodeList->toNormalizedString(true);

The result is:

Robbie Fulks
Couples in Trouble
Richard Thompson
Mock Tudor
Keller Williams
Laugh

2.4. getNodesByNodeValue

The getNodesByNodeValue method allows you to search the document tree for nodes of a specific node value.

This is especially useful for finding text or CDATA Section nodes containing a certain text value.

You must pass in the node value that you are searching for as well as a context node - a node from which the search should start.

The following example returns a node list of all nodes in the current document with a node value of "Robbie Fulks":

//find all text nodes with a value of "Robbie Fulks" in cdlibrary
$myTextNodeList =& $cdCollection->getNodesByNodeValue("Robbie Fulks", $cdCollection);

//get first match
$firstItem =& $myTextNodeList->item(0);

//echo parent node to browser
echo $firstItem->parentNode->toNormalizedString(true);

The result is:

<name>Robbie Fulks</name>

3. XML to and from Arrays

Sometimes it is useful to convert an XML document into a PHP array, or to import a PHP array as an XML document.

DOMIT! provides two methods to accomplish this: toArray, and DOMIT_Utilities::fromArray.

Note: It may be faster to use the PHP/Expat method xml_parse_into_struct instead of the DOMIT! array methods when converting XML to arrays.

3.1. toArray

The toArray method converts an xml node and its children to an array.

To convert the first <cd> element of the cdlibrary example to an array, you would do this:

//convert first <cd> element to array
$myArray =& $cdCollection->documentElement->firstChild->toArray();

//echo to browser
print "<pre>";
print_r($myArray);
print "</pre>";

The result is:

Array
(
    [cd] => Array
        (
            [attributes] => Array
                (
                    [discid] => bb0c3c0c
                )

            [0] => Array
                (
                    [name] => Array
                        (
                            [attributes] => Array
                                (
                                )

                            [0] => Robbie Fulks
                        )

                )

            [1] => Array
                (
                    [title] => Array
                        (
                            [attributes] => Array
                                (
                                )

                            [0] => Couples in Trouble
                        )

                )

        )

)

3.2. DOMIT_Utilities::fromArray

The DOMIT_Utilities::fromArray method generates a node tree from an array and appends it to the specified document or node.

The convention follows that of the fromArray method in the minixml library:

//Create an array to represent a person Bob
$bob = array(
  'name' => array( 
    'first' => 'Bob',
    'last' => 'Roberts'
  ),
  'age' => 35,
  'email' => 'bob@example.com',
  'location' => array(
    'streetaddr' => '123 Skid Row',
    'city' => 'Dark City',
    'state' => 'DN',
    'country' => 'XNE',
  ),
);


//Create another array to represent a person Mary
$mary = array(
  'name' => array( 
    'first' => 'Mary',
    'last' => 'Zlipsakis'
  ),
  'age' => 94,
  'location' => array(
    'streetaddr'=> '54343 Park Ave',
    'city' => 'SmallVille',
    'state' => 'DN',
    'country' => 'XNE',
  ),
  'icecream' => 'vanilla',
);


//Create a big array that contains all our people
$xmlArray = array();
$xmlArray["people"]["person"] = array();

array_push($xmlArray["people"]["person"], $mary);
array_push($xmlArray["people"]["person"], $bob);

//instatiate a DOMIT! document
require_once('xml_domit_include.php');
$xmldoc =& new DOMIT_Document();

//require DOMIT_Utilities file
require_once('xml_domit_utilities.php');

//use fromArray to populate document
DOMIT_Utilities::fromArray($xmldoc, $xmlArray);

//echo to browser
echo $xmldoc->toNormalizedString(true);

The result is:

<people>
    <person>
        <name>
            <first>Mary</first>
            <last>Zlipsakis</last>
        </name>
        <age>94</age>
        <location>
            <streetaddr>54343 Park Ave</streetaddr>
            <city>SmallVille</city>
            <state>DN</state>
            <country>XNE</country>
        </location>
        <icecream>vanilla</icecream>
    </person>
    <person>
        <name>
            <first>Bob</first>
            <last>Roberts</last>
        </name>
        <age>35</age>
        <email>bob@example.com</email>
        <location>
            <streetaddr>123 Skid Row</streetaddr>
            <city>Dark City</city>
            <state>DN</state>
            <country>XNE</country>
        </location>
    </person>
</people>

4. The nodetools Library

The nodetools library is a set of helper utilities for processing nodes.

4.1. nodetools::parseAttributes

The nodetools::parseattributes method parses an attribute string into an array of key / value pairs.

For example:

//require the nodetools library
require_once('xml_domit_nodetools.php');

//build a sample attribute string
$myAttrString = 'x="27" y="12"';

//parse into an array
$myArray = nodetools::parseattributes($myAttrString);

//echo to browser
echo "<pre>";
print_r($myArray);
echo "</pre>";

The result is:

Array
(
    [x] => 27
    [y] => 12
)

4.2. nodetools::moveUp

The nodetools::moveUp method moves a node to the previous index in the childNodes array.

It takes a single argument -- a reference to the node to be moved.

The following example moves the last <cd> element to the second last position:

//require the nodetools library
require_once('xml_domit_nodetools.php');

//move the node up
nodetools::moveUp($cdCollection->documentElement->lastChild);

//echo to browser
$cdCollection->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
</cdlibrary>

4.3. nodetools::moveDown

The nodetools::moveDown method moves a node to the next index in the childNodes array.

It takes a single argument -- a reference to the node to be moved.

The following example moves the first <cd> element to the second position:

//require the nodetools library
require_once('xml_domit_nodetools.php');

//move the node up
nodetools::moveDown($cdCollection->documentElement->firstChild);

//echo to browser
$cdCollection->toNormalizedString(true);

The result is:

<?xml version="1.0"?>
<cdlibrary>
  <cd discid="9b0ce70c">
    <name>Richard Thompson</name>
    <title>Mock Tudor</title>
  </cd>
  <cd discid="bb0c3c0c">
    <name>Robbie Fulks</name>
    <title>Couples in Trouble</title>
  </cd>
  <cd discid="cf11720f">
    <name>Keller Williams</name>
    <title>Laugh</title>
  </cd>
</cdlibrary>

4.4. nodetools::nodeExists

The nodetools::nodeExists method checks whether a node exists.on a given path. The path expression must conforming to the getElementsByPath syntax.

The method takes two parameters -- a reference to the calling node (the node at which the search begins) and the path expression.

To check if the first child <cd> element of the <cdlibrary> element contains a <title> element, you can do this:

//require the nodetools library
require_once('xml_domit_nodetools.php');

//check if node exists
if (nodetools::nodeExists($cdCollection, '/cdlibrary/cd/title') {
  echo "Node exists!";
}
else {
  echo "Node does NOT exist";
}

The result is:

Node exists!

4.5. nodetools::fromPath

The nodetools::fromPath method generates a heirarchy of elements based on a path expression.

It takes three parameters:

  • a reference to the DOMIT_Document that will create the elements

  • the path expression

  • the node value of a text node to be appended to the last element (if required)

For example:

//require the nodetools library
require_once('xml_domit_nodetools.php');

//build node tree
$myNodes =& nodetools::fromPath($xmldoc, '/someElement/childElement', "Sample text");

//echo to browser
echo $myNodes->toNormalizedString(true);

The result is:

<someElement>
  <childElement>Sample text</childElement>
</someElement>

Chapter 10. XML Namespaces

The following chapter deals with XML namespaces. The XML Namespaces specification defines a simple method for distinguishing XML and element and attribute names, by associating them with URI references (namespaces).

1. Introduction to XML Namespaces

With the widespread adoption of XML, we have increasingly seen the coexistence and integration of different XML standards.

So what happens when you want to combine two XML documents, and each document contains an element named <title>, but the <title> element has a different meaning in each document? For example:

XML DOCUMENT #1:
<?xml version="1.0"?>
<individual gender="m">
  <name>George Henry III</name>
  <title>Duke of Fredericton</title>
  <books></books>
</individual>


XML DOCUMENT #2:
<?xml version="1.0"?>
<book>
  <title>Transcendence Through XML</title>
</book>


COMBINED DOCUMENT:
<?xml version="1.0"?>
<individual gender="m">
  <name>George Henry III</name>
  <title>Duke of Fredericton</title>
  <books>
    <book>
      <title>Transcendence Through XML</title>
    </book>
  </books>
</individual>

The potential problems here should be obvious. If, for instance, the getElementsByTagName method was used to obtain a node list of elements with the a node name of "title", how would one differentiate between a person's title, and the title of a book?

Such naming collisions can cause confusion and error, and it is essential to have some means of differentiating between identically named, but contextually different, nodes.

1.1. URIs, Namespace Prefixes, and Namespace Declarations

The XML Namespace specification proposes a mechanism whereby elements and attributes can be assigned namespaces -- or, unique identifiers that allow one to differentiate between similar tag names.

The unique identifier comes in the form of a URI (Uniform Resource Identifier), which is a convention for identifying resources on the web (an URL, or universal resource locator, is a type of URI).

For example, the namespace URI for the Dublin Core -- an XML specification for interoperable online metadata standards -- is:

http://purl.org/dc/elements/1.1/

Since the URI tends to be somewhat longish to appear frequently in your document, you can specify an abbreviation for the URI, know as the Namespace Prefix.

The namespace URI and namespace prefix are defined within your XML document, often at the level of the document element (but not necessarily so), using the keyword xmlns.

In our <individual> example from above, we might use namespaces to do the following:

<?xml version="1.0"?>
<person:individual person:gender="m"
        xmlns:person="http://www.engageinteractive.com/person/"
        xmlns:book="http://www.engageinteractive.com/book/">
  <person:name>George Henry III</person:name>
  <person:title>Duke of Fredericton</person:title>
  <person:books>
    <book:book>
      <book:title>Transcendence Through XML</book:title>
    </book:book>
  </person:books>
</person:individual>

You'll notice that in the document element node, two items have been added which appear to be attributes but which are actually Namespace Declarations. A namespace declaration:

  • begins with the prefix xmlns:

  • is followed by the namespace prefix: e.g., person

  • is followed by an equal sign

  • concludes with the URI in quotation marks: e.g. "http://www.engageinteractive.com/person/"

The namespace declaration says basically that:

There are elements and /or attributes in the following XML that will be assigned the URI http://www.engageinteractive.com/person/ and these elements and/or attributes are different from elements and/or attributes that are assigned the URI http://www.engageinteractive.com/book/

It also says that:

We will use the prefix "person" as shorthand for the URI http://www.engageinteractive.com/person/, and we will use the abbreviation "book" as shorthand for the URI http://www.engageinteractive.com/book/

It is then a simple task of placing the prefixes person: and book: before all corresponding elements and attributes. A namespace aware XML parser will be able to parse and differentiate between the elements named <person:title> and <book:title>

1.2. Default Namespace

If an XML document does not contain a namespace declaration, then it is assumed that all elements in the document belong to the default namespace. The default namespace is null unless defined by the user.

If you would like to specify a user-defined default namespace, omit the namespace prefix in your xmlns declaration:

xmlns="http://www.engageinteractive.com/this.is.a.default.namespace"

Note: Default namespaces do not apply to attributes.

1.3. Local Name

When an XML document uses namespaces, the tag name of an element or attribute (i.e., the part following the namespace prefix) is referred to as its Local Name.

The local name of the <person:individual> element, for instance, is "individual".

1.4. Qualified Name

The concatenated namespace prefix and local name are referred to as the Qualified Name, or qname.

The qualified name of <person:individual> element, for instance, is "person:individual"

1.5. DOM and XML Namespaces

The DOM specifies a number of namespace aware methods, such as getElementsByTagNameNS, which allows you to specify the qualified name of the element that you are searching for.

2. DOMIT! and XML Namespaces

DOMIT! (although not DOMIT! Lite) is compliant with the XML Namespace specification. It implements the following methods:

2.1. setNamespaceAwareness

To enable DOMIT! to process namespace data, invoke the setNamespaceAwareness method before populating your XML document.

$xmldoc->setNamespaceAwareness(true);

2.2. declareNamespace

The declareNamespace method allows you to make a namespace declaration at the level of the calling element.

You must specify two parameters:

  • a namespace prefix

  • a namespace URI

The following creates a namespace declaration with a prefix of "domit" at the document element level:

$xmldoc->documentElement->declareNamespace('domit', 'http://www.engageinteractive.com/domit/');

The resulting namespace declaration would look like this:

xmlns:domit="http://www.engageinteractive.com/domit/"

2.3. declareDefaultNamespace

The declareDefaultNamespace method allows you to make a default namespace declaration at the level of the calling element.

$xmldoc->documentElement->declareDefaultNamespace('http://www.foo.com/a.default.namespace');

The resulting ndefault amespace declaration would look like this:

xmlns="http://www.foo.com/a.default.namespace"

To reset the default namespace back to its original null value, pass in an empty string to the declareDefaultNamespace method:

$xmldoc->documentElement->declareDefaultNamespace("");

2.4. getNamespaceDeclarationsInScope

The getNamespaceDeclarationsInScope method returns an associative array of all namespace declarations that are in scope for the calling element.

//acquire array of namespace declarations in scope
$nsMap = $xmldoc->documentElement->firstChild->getNamespaceDeclarationsInScope();

//echo to browser
print "<pre>";
print_r($nsMap);
print "</pre>";

The result is:

Array
(
    [http://www.engageinteractive.com/person/] => person
    [http://www.engageinteractive.com/book/] => book
)

2.5. getDefaultNamespaceDeclaration

The getDefaultNamespaceDeclaration method returns a string containing the default namespace declaration in scope for for the calling element.

echo $xmldoc->documentElement->childNodes[2]->firstChild->getDefaultNamespaceDeclaration();

The result is an empty string:

2.6. copyNamespaceDeclarationsLocally

A common problem with namespaces occurs when an element is moved to another location in a document, or copied to another DOM document.

If the node being copied is not the document element, and the namespace declarations in scope for that element are declared higher up in the DOM tree (for example, in the document element), then the namespace declarations can be lost.

In the following XML, for instance, if the <book:book> element were to be copied to another DOM document, the namespace declarations in the document element might not accompany the element:

 <?xml version="1.0"?>
<person:individual person:gender="m"
        xmlns:person="http://www.engageinteractive.com/person/"
        xmlns:book="http://www.engageinteractive.com/book/">
  <person:name>George Henry III</person:name>
  <person:title>Duke of Fredericton</person:title>
  <person:books>
    <book:book>
      <book:title>Transcendence Through XML</book:title>
    </book:book>
  </person:books>
</person:individual>

The copyNamespaceDeclarationsLocally method addresses this problem, by forcing all namespace delarations that are in scope for the element to be explicitly duplicated on the element itself.

//get reference to book:book node
$bookNode =& $xmldoc->documentElement->childNodes[2]->firstChild;

//copy namespace declarations
$bookNode->copyNamespaceDeclarationsLocally();

//echo to browser
echo $bookNode->toNormalizedString(true);

The result is:

<book:book xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/">
  <book:title>Transcendence Through XML</book:title>
</book:book>

2.7. createElementNS

The createElementNS method is used to create a namespace compliant element.

createElementNS takes two parameters:

  • the namespace URI of the element

  • its qualified name

The following example will create the <book:title> element:

//create namespace compliant element
$myElement =& $xmldoc->createElementNS('http://www.engageinteractive.com/book/', 'book:title');

//echo to browser
echo $myElement->toNormalizedString(true);

The result is:

<book:title />

Note that using the createElement method will not create an element properly when namespace awareness is enabled.

2.8. getElementsByTagNameNS

The getElementsByTagNameNS method is a namespace compliant version of getElementsByTagName. It allows you to search for elements in an XML document by specifying:

  • the namespace URI of the element

  • the local name

The following example matches the <book:title> element:

//find book:title element
$myNodeList =& $xmldoc->getElementsByTagNameNS('http://www.engageinteractive.com/book/', 'title');

//echo to browser
echo $myNodeList->toNormalizedString(true);

The result is:

<book:title>Transcendence Through XML</book:title>

2.9. createAttributeNS

The createAttributeNS method is the namespace equivalent of createAttribute. It enables you to create a new, namespace compliant, attribute node.

createAttributeNS takes two parameters:

  • the namespace URI of the attribute

  • the local name

The following example creates a new attribute named "book:language", with a value of "en"

//create namespace compliant attribute
$myAttr =& $xmldoc->createAttributeNS('http://www.engageinteractive.com/book/', 'language', 'en');

//echo to browser
echo $myAttr->toNormalizedString(true);

The result is:

book:language='en'

2.10. hasAttributeNS and getAttributeNS

The hasAttributeNS and getAttributeNS methods are namespace compliant versions of hasAttribute and getAttribute. Both methods take as parameters:

  • the namespace URI of the attribute

  • the local name

The following example checks if an attribute named 'gender' with a namespace URI of 'http://www.engageinteractive.com/person/' exists in the document element, and echoes the value to the browser:

//set variables for namespace URI and local name
$URI = 'http://www.engageinteractive.com/person/';
$localName = 'gender';

//determine if atrribute exists
if ($xmldoc->documentElement->hasAttributeNS($URI, $localName)) {
  
  //echo to browser
  echo $xmldoc->documentElement->getAttributeNS($URI, $localName);
}

The result is:

m

2.11. setAttributeNS

The setAttributeNS method is a namespace compliant version of setAttribute. It creates a new namespace compliant attribute for the calling element, or overwrites the value of the attibute if one already exists.

setAttributeNS takes three parameters:

  • the namespace URI of the attribute

  • the qualified name

  • the value of the attribute

The following example sets a new attribute on the <book:title> element.

//find book:title element
$myNodeList =& $xmldoc->getElementsByTagNameNS('http://www.engageinteractive.com/book/', 'title');

//get first match
$myElement =& $myNodeList->item(0);

//add attribute named "book:language" to the element
$myElement->setAttributeNS('http://www.engageinteractive.com/book/', 'book:language', 'en');

//echo to browser
echo $myElement->toNormalizedString(true);

The result is:

<book:title book:language="en">Transcendence Through XML</book:title>

2.12. getAttributeNodeNS and setAttributeNodeNS

The getAttributeNodeNS and setAttributeNodeNS methods are namespace compliant versions of getAttributeNode and setAttributeNode.

getAttributeNodeNS takes two parameters:

  • the namespace URI of the attribute

  • the local name

setAttributeNodeNS takes a single parameter -- a reference to the node to be added/set.

The following example echoes the value of the "gender" attribute and then changes it to "f":

//get the attribute node named "gender" in the document element
$attrNode =& $xmldoc->documentElement->getAttributeNodeNS('http://www.engageinteractive.com/person/', 'gender');

//echo value of attr node
echo "original value of gender node is: " . $attrNode->getValue();

//create a new attribute node
$myAttr =& $xmldoc->createAttributeNS('http://www.engageinteractive.com/person/', 'gender', 'f');

//overwrite existing attr with new one
$xmldoc->documentElement->setAttributeNodeNS();

//echo to browser
echo $xmldoc->documentElement->toNormalizedString(true);

The result is:

<person:individual person:gender="f"
        xmlns:person="http://www.engageinteractive.com/person/"
        xmlns:book="http://www.engageinteractive.com/book/">
  <person:name>George Henry III</person:name>
  <person:title>Duke of Fredericton</person:title>
  <person:books>
    <book:book>
      <book:title>Transcendence Through XML</book:title>
    </book:book>
  </person:books>
</person:individual>

2.13. removeAttributeNS

The removeAttributeNS method is the namespace counterpart to removeAttribute. It enabled you to remove an attribute from an element.

removeAttributeNS takes two parameters:

  • the namespace URI of the attribute

  • the local name

The following example removes the "gender" attribute from the document element:

//remove "gender" attribute from document element
$xmldoc->documentElement->removeAttributeNS('http://www.engageinteractive.com/person/', 'gender');

//echo to browser
echo $xmldoc->documentElement->toNormalizedString(true);

The result is:

<person:individual
        xmlns:person="http://www.engageinteractive.com/person/"
        xmlns:book="http://www.engageinteractive.com/book/">
  <person:name>George Henry III</person:name>
  <person:title>Duke of Fredericton</person:title>
  <person:books>
    <book:book>
      <book:title>Transcendence Through XML</book:title>
    </book:book>
  </person:books>
</person:individual>

Chapter 11. XPath

DOMIT! now has experimental XPath support.

XPath is a syntax for locating nodes in an XML tree using "path"-like expressions. A good introductory tutorial on XPath can be found at: http://www.w3schools.com/xpath/

1. XPath Overview

Add content here!

2. selectNodes

DOMIT! implements XPath calls through the selectNodes method. Not all of the specification is supported currently.

selectNodes can be called from any XML document or element node. It converts an XPath expression into a node list or single node that matches the specified pattern. For example:

$nodeList =& $xmldoc->selectNodes("/book/chapter[@id='1234']");

The above example will return a node list containing all nodes in the XML document:

  • whose document element is named 'book', which has

  • a child node named 'chapter', which has

  • an 'id' attribute with a value of '1234'

If you would like a single node to be returned by selectNodes, rather than the entire node list of matching elements, you can specify the index of the requested node by passing an integer as the second parameter of selectNodes.

The index is 1-based. The following example will return the first node matching the XPath expression:

$nodeList =& $xmldoc->selectNodes("/book/chapter[@id='1234']", 1);

Add content here!

Chapter 12. DOMIT! Roadmap

Some of the plans for DOMIT include:

  • UTF-8 support

  • fuller XPath support

  • OneDOM: a generic wrapper for DOMIT! and the PHP DOM_XML library

Chapter 13. Contributing to DOMIT!

DOMIT! has only been made possible through the suggestions, bug reports, and code submissions of others.

If you would like to contribute to DOMIT! or join the DOMIT! team, please email