Table of Contents
useHTTPClient:
Forcing loadXML to use an HTTP ClientsetConnection
: Manually specifying HTTP connection parameterssetAuthorization
: Using basic HTTP authorization with your connectionsetProxyConnection
: Retrieving XML data through a proxy serversetProxyAuthorization
: Using basic HTTP authorization with your proxypreserveWhiteSpace
appendEntityTranslationTable
setNamespaceAwareness
declareNamespace
declareDefaultNamespace
getNamespaceDeclarationsInScope
getDefaultNamespaceDeclaration
copyNamespaceDeclarationsLocally
createElementNS
getElementsByTagNameNS
createAttributeNS
hasAttributeNS
and getAttributeNS
setAttributeNS
getAttributeNodeNS
and setAttributeNodeNS
removeAttributeNS
XML (Extensible Markup Language) is a standard for encapsulating textual data. XML is strictly structured, but also human readable.
Having a strictly defined format makes it easier for computer programs (i.e., XML parsers) to build, extract, manipulate, and exchange the data. Since XML is written in human readable text and not binary format, it is much more convenient for people to work with on a daily basis.
This simple balance of structure and readability is one of the primary reasons that XML has seen such widespread adoption over recent years.
The following description of a person's cd music collection is one possible example of XML formatted text:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
As should be apparent from the example, XML has a tree-like structure. This is referred to as a Document.
There are a varierty of different ways of demarcating content in an XML Document. The following sections presents a brief overview of some of these ways.
An Element in XML is a type of content whose primary purpose is to contain other content. Elements are like bookends, and therefore consist of two parts: a start tag and an end tag.
Take a look at the first line of text in the cd library example: <cdlibrary>
.
This is an example of a start tag. A start tag always:
begins with a left angle bracket: <
ends with a right angle bracket: >
has a name: cdlibrary
At the bottom of the XML document is an end tag, which has a slightly different format: </cdlibrary>
.
An end tag always:
begins with a left angle bracket and a forward slash: </
ends with a right angle bracket: >
has a name identical to its matching start tag: cdlibrary
An XML element can contain other types of XML content, including other elements. For example, the <person>
element below contains a single <name>
element:
<person> <name>John Heinstein</name> </person>
It is possible to have an element containing no XML content. This is referred to as an empty element. There is a shorthand notation for representing an empty element:
<someEmptyElement/>
The longhand equivalent of this is:
<someEmptyElement></someEmptyElement>
Take a look at the first cd
element in cdlibrary
:
<cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd>
Some additional information is present in the start tag: discid="bb0c3c0c"
. This is a type of XML content referred to as an Attribute. An attribute is used to store short, simple units of text.
An attribute always:
contains a unique named key, such as: discid
followed by an equal sign: =
followed by a value contained in either single or double quotes: "bb0c3c0c"
There can be multiple attributes in any start tag, as long as the attribute names are unique. For example:
<point x='10' y='35'/>
Textual XML content not stored in attributes is referred to as Character Data. Character data is always contained within elements, as we can see in the following example from the cdlibrary document
<name>Robbie Fulks</name>
There are two reserved characters which cannot be present in valid XML character data. These are the ampersand character (&
) and the left angle bracket (<
).
If either of these characters need to be present in XML text, it must be escaped. This is done by substituting the entity equivalent of the character.
The entity equivalent of the ampersand (&
) is the string &
;
The entity equivalent of the left angle bracket (<
) is the string <
To represent the string x <= y + 1
as character data, for example, one must escape the left angle bracket:
<relationship>x <= y + 1</relationship>
Note: To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character ('
) may be represented as '
and the double-quote character ("
) as "
Sometimes your XML data contains many illegal characters that must be escaped -- such as when you need to store HTML content within XML content:
<htmlcode><img src="http://www.someurl.com/pic.jpg" /></htmlcode>
It is not only work intensive to escape many illegal characters, but the readability of your document suffers.
There is a special construct called a CDATA Section that is reserved for demarcating text data that can be written in its literal form. The following example rewites the above <htmlcode>
example as a CDATA Section:
<htmlcode><![CDATA[<img src="http://www.someurl.com/pic.jpg"" />]]></htmlcode>
As you can see there is no need to escape the left angle bracket beginning the <img
tag.
A CDATA Section always:
begins with the string <![CDATA[
ends with the string ]]>
Note: if the string ]]>
is contained within a CDATA Section, the right angle bracket (>
)must be escaped as >
so that it will not be confused with the terminating CDATA Section string.
An XML Comment is a construct for adding remarks to your XML. It is similar to an HTML comment in that it:
begins with the string <!--
ends with the string -->
A comment could be added to the cdlibrary example like this:
<?xml version="1.0"?> <cdlibrary> <!-- Not many cds left after I got robbed --> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
An XML Processing Instruction indicates to an application that it must perform some processing operation.
Every XML document is required to begin with a special type of processing instruction know as an XML Declaration (in practice, the XML declaration is often omitted):
<?xml version="1.0"?>
A processing instruction:
begins with the string <?
followed by the target of the operation (the application to which the operation is to be directed): e.g. xml
followed by the data for the application to process: e.g. version="1.0"
ending with the string ?>
Another example of a processing instruction is found in the declaration of PHP code within an HTML page:
<?php //code here ?>
The target "php" informs a web server to process the subsequent data with a PHP interpreter rather than as HTML code.
A Document Type Declaration is a mechanism for defining what is an acceptable structure for an XML document. A validating XML parser can compare an XML document to its DTD and determine whether it is valid or not.
A DTD follows the XML declaration and comes before any actual XML data.
The following is an example of a DTD for an XML document containing a single element named "foo":
<!DOCTYPE foo [ <!ELEMENT foo (#PCDATA)> ]>
When one speaks of the DOM (Document Object Model), one is not is not referring to XML per se. Rather, the DOM is one of a number of different approaches to conceptualizing, parsing, and interacting with XML content.
The DOM processes XML by creating an object called a Node out of each unit of content in an XML document. Nodes are assembled into a hierarchical collection called a DOM Document.
The entire XML document is held in memory at once, which allows the collection of nodes to be traversed easily. The DOM approach can, however, be memory intensive for larger XML documents.
The DOM also describes a number of methods and properties that allow the user to interact programatically with the nodes of a DOM Document.
We will be examining some of these methods and properties in the following tutorial.
The DOM specification delineates a number of different kinds of nodes, each of which correspond to the different kinds of XML content. A set of three node properties are used to distinguish one kind of node from another:
node type: an integer from 1 to 12 specifying the type of node
node name: the name of the node, can have various values depending on node type
node value: the value of the node, can various values depending on node type
A Document Node represents the DOM document itself -- the entire collection of nodes in a. It has:
a node type of 9
a node name of #document
a node value of null
An Element Node represents an XML Element. Take for instance the following <fullname>
element:
<fullname>John Heinstein</fullname>
This element has:
a node type of 1
a node name of fullname
a node value of null
An Attribute Node represents an XML Attribute. Take for instance the following serial
attribute:
<item serial="123456"/>
This attribute has:
a node type of 2
a node name of serial
a node value of 123456
A Text Node represents XML Character Data that is not specified as a CDATA Section. Take for instance the text content bounded by the <fullname>
element:
<fullname>John Heinstein</fullname>
This text node has:
a node type of 3
a node name of #text
a node value of John Heinstein
A CDATA Section Node represents XML Character Data that is specified as a CDATA Section. Take the following CDATA Section:
<htmlcode><![CDATA[<img src="http://www.someurl.com/pic.jpg"" />]]></htmlcode>
This CDATA Section has:
a node type of 4
a node name of #cdata-section
a node value of <img src="http://www.someurl.com/pic.jpg"" />
A Comment Node represents an XML comment. Take the following XML comment:
<!-- Not many cds left after I got robbed -->
This comment node has:
a node type of 8
a node name of #comment
a node value of Not many cds left after I got robbed
A Processing Instruction Node represents an XML processing instruction. The most common processing instruction that you will find in a DOM Document is the XML Declaration:
<?xml version="1.0"?>
This processing instruction node has:
a node type of 7
a node name of xml
a node value of version="1.0"
The nodes of a DOM Document are structured as a tree of branching nodes. The terminology to describe the relationship of these nodes is similar to how we would describe the relationship between individuals in a family tree.
Let us use the cdlibrary XML to illustrate this:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
All nodes that are direct descendants of a node are referred to as its Child Nodes.
Only nodes of type element are permitted to contain a child nodes collection. Children themselves, however, can be of various node types, including element nodes, text and CDATA Section nodes, and comment nodes.
In the cdlibrary example:
the <cdlibrary>
element contains three child nodes of type element (the three cd
nodes)
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
each <cd>
node contains two child nodes of type element (a <name>
node and a <title>
node)
<cd discid="bb0c3c0c">
<name>Robbie Fulks</name>
<title>Couples in Trouble</title>
</cd>
each <name>
node contains one child node of type text
<name>Richard Thompson</name>
each <title>
node contains one child node of type text
<title>Laugh</title>
A child node is referred to by its numerical index in the child nodes collection. The first child node is generally assigned an index of 1, although for technical reasons some DOM implementations will start at 0.
If an element contains no children, it will still have an child nodes collection (that is empty).
Note: Attribute nodes are not included in the child nodes collection. These are in a separate collection reserved specifically for attributes.
The First Child is a DOM property that refers to the first child node in a child nodes collection.
In our cdlibrary example, the first child of each of the <cd>
nodes is the element node <name>
.
<cd discid="cf11720f">
<name>Keller Williams</name>
<title>Laugh</title>
</cd>
If an element contains no child nodes, the first child is null
.
The Last Child is a DOM property that refers to the last child node in the child nodes collection.
In our cdlibrary example, the last child of each of the <cd>
nodes is the element node <title>
.
<cd discid="cf11720f">
<name>Keller Williams</name>
<title>Laugh</title>
</cd>
If an element contains no child nodes, the last child is null
.
In the same way that one can travel down the hierarchy of a DOM document via the child nodes collection, the DOM specifies a way to travel up the hierarchy. The ancestor of any node is referred to as its Parent Node.
In our cdlibrary example:
the parent node of each <cd>
element is the <cdlibrary>
element
the parent node of each <name>
element is its containing <cd>
element
the parent node of each <title>
element is its containing <cd>
element
the parent node of the text contained in the <name>
element is the <name>
element
the parent node of the text contained in the <title>
element is the <title>
element
Note: Attributes do not contain a reference to parent nodes.
The DOM specifies an explicit relationship between nodes that occupy the same level of a DOM tree. These nodes are referred to as Sibling Nodes.
One might think of the relationship between sibling nodes as the links in a chain. Each node knows about the node immediately preceding it and the node immediately following it.
The node immediately preceding any node in a sibling chain is referred to as its Previous Sibling.
In our cdlibrary example:
the previous sibling of each <title>
element is the <name>
element
the previous sibling of each <name>
element is null
Note: If a node has no previous sibling, there still exists a previous sibling reference, but it is null
.
Each element node contains an Attributes list, or a reference to the collection of attributes assigned to it. In the following example, the <item>
element contains a list of five attributes:
<item desc="post" material="steel" length="120" diameter="5" price="0.75"/>
These attributes can be accessed either by name or numerical index.
Each node in a DOM Document -- with the exclusion of attribute nodes -- contains a reference to the DOM Document that contains it. This is referred to as the Owner Document property of a node.
DOMIT! is an XML parser that is mostly consistent with the Document Object Model (DOM) Level 2 specification.
DOMIT! is not an extension; it is written purely in PHP and should work in any PHP 4 or 5 environment, regardless of restrictions put in place by your web hosting provider.
It has been designed for speed and ease of use. However, because DOMIT! is composed of interpreted rather than compiled code, you may see sluggish performance with large XML files on a low-memory server.
DOMIT! must be used in conjunction with a SAX parser. By default, you have the option of using either the Expat parser (available with most later distributions of PHP) or the SAXY parser - another purely PHP-based parser developed by Engage Interactive.
As of version 0.9, DOMIT! now includes a lightweight version named DOMIT! Lite, which is slightly faster, especially for larger documents. However, it does not handle parsing of the xml prolog, processing instructions, comments, and certain other functionality.
As of version 0.96, DOMIT! has support for XML namespaces.
Version 0.98 brings PHP5 compatability.
Since DOMIT! is not an extension, it requires no special setup on your web server. You will, however, need to have the following files present on your server filesystem:
xml_domit_include.php
- include file for DOMIT!, ensures that include paths are resolved properly.
xml_domit_shared.php
- shared code for DOMIT! and DOMIT! Lite
xml_domit_parser.php
- the main DOMIT! php file.
xml_domit_utilities.php
- required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Document.
xml_domit_getelementsbypath.php
- required if you would like to search for elements in your DOMIT_Document using a path-based syntax.
xml_domit_nodemaps.php
- data structures that contain collections of nodes
xml_domit_nodetools.php
- a collection of tools to assist in XML processing
xml_domit_cache.php
- simple caching class for DOMIT! and DOMIT! Lite documents
xml_saxy_parser.php
- required if you would like to use the SAXY parser with DOMIT! instead of the Expat parser.
xml_domit_doctor.php
- class for repairing malformed xml
xml_domit_xpath.php
- experimental support for XPath queries
php_file_utilities.php
- generic file input / output utilities
php_http_client_generic.php
- generic http client class
php_http_client_include.php
- include file for http client class
php_http_connector.php
- helper class for php_http_client
php_http_exceptions.php
- http exceptions class
php_http_proxy.php
- http proxy class
If you wish to use DOMIT! Lite, a leaner and somewhat faster (although fewer-featured) version of DOMIT!, you will require the following files:
xml_domit_lite_include.php
- include file for DOMIT! Lite, ensures that include paths are resolved properly.
xml_domit_shared.php
- shared code for DOMIT! and DOMIT! Lite.
xml_domit_lite_parser.php
- the main DOMIT! Lite php file.
xml_domit_utilities.php
- required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Lite_Document.
xml_domit_getelementsbypath.php
- required if you would like to search for elements in your DOMIT_Lite_Document using a path-based syntax.
xml_domit_nodemaps.php
- data structures that contain collections of nodes
xml_domit_cache.php
- simple caching class for DOMIT! and DOMIT! Lite documents
xml_saxy_lite_parser.php
- required if you would like to use the SAXY Lite parser with DOMIT! Lite instead of the Expat parser.
xml_domit_doctor.php
- class for repairing malformed xml
php_file_utilities.php
- generic file input / output utilities
php_http_client.php
- generic http client class
php_http_client_include.php
- include file for http client class
php_http_connector.php
- helper class for php_http_client
php_http_exceptions.php
- http exceptions class
php_http_proxy.php
- http proxy class
To implement DOMIT! in your PHP scripts, include the file xml_domit_include.php
:
require_once('somepath/xml_domit_include.php');
To implement DOMIT! Lite in your PHP scripts, include the file xml_domit_lite_include.php
:
require_once('somepath/xml_domit_lite_include.php');
In DOMIT!, a DOM Document is represented by the DOMIT_Document class.
You create an instance of the DOMIT_Document class in the same way as any other PHP class:
$cdCollection =& new DOMIT_Document();
A DOMIT! Lite document is instantiated like this:
$cdCollection =& new DOMIT_Lite_Document();
Once a document has been instantiated, it is ready to be populated with XML data.
Note: to ensure PHP 4 backwards compatability, it is necessary to include an ampersand (&) symbol after the equal sign when returning a reference to a DOMIT_Document or any other DOMIT! object.
If we wanted to create a DOMIT_Document out of a PHP string, we would use the parseXML
method. Take for instance, the cd collection XML described in the previous section:
$cdCollection =& new DOMIT_Document(); //instantiate document //create string variable with XML text $cdCollectionString = "<?xml version="1.0"?><cdlibrary><cd discid=\"bb0c3c0c\"> <name>Robbie Fulks</name> <title>Couples in Trouble</title></cd> <cd discid=\"9b0ce70c\"><name>Richard Thompson</name> <title>Mock Tudor</title></cd><cd discid=\"cf11720f\"> <name>Keller Williams</name> <title>Laugh</title></cd> </cdlibrary>"; //use parseXML method to populate document $success = $cdCollection->parseXML($cdCollectionString, true); //parse document
returns true if the parsing is successfulparseXML
The loadXML
method of DOMIT_Document is used to load an XML string from a file or url. It uses an identical syntax to the parseXML
method:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml");
The above example parses a file from the file system.
To parse an url, you would specify the full HTTP address as the first parameter:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml");
DOMIT! relies on an underlying SAX parser to parse XML data. You have the choice of one of two SAX parsers:
Expat is a C-based SAX parser written by James Clark that comes bundled with most later distributions of PHP.
SAXY is a pure PHP SAX parser written by Engage Interactive that comes bundled with DOMIT!
The second parameter of both parseXML
and loadXML
allows you to specify a SAX parser. The useSAXY
parameter is a boolean whose default value is true. Specifying false will check whether Expat is available and use it to parse and pass the XML data to DOMIT!:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml", false); //Expat specified
If Expat cannot be detected, DOMIT! will revert to SAXY for its parsing.
Sometimes the default DOMIT! mechanism for populating a DOMIT!_Document is insufficient. This is particularly true when retrieving XML data from a remote location.
By default, DOMIT! uses the PHP function get_file_contents
or standard PHP file input streams to retrieve the contents of an XML file. However, both of these approaches can fail when passed a remote URL as the location of the XML file to parsed.
A number of additional options exist to deal with these possibilities.
As of version 1.0, DOMIT! comes bundled with the php_http_client
library, written by Engage Interactive. With the useHTTPClient
method, DOMIT! can be forced to establish a standard HTTP connection to the web server hosting the XML file:
$cdCollection =& new DOMIT_Document(); //specify that an HTTP client should be used to retrieve XML $xmldoc->useHTTPClient(true); //call loadXML method as usual $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);
The HTTP connection will be attempted on port 80.
If you need to establish an HTTP connection to retrieve your XML data, but the useHTTPClient
method does not provide enough flexibility, the setConnection
method of DOMIT_Document can be used to manually set the parameters of the connection.
$cdCollection =& new DOMIT_Document(); $xmldoc->setConnection('http://www.engageinteractive.com', '/', '955'); //call loadXML method as usual $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);
In the above example, an HTTP connection will be established on port 955 of host http://www.engageinteractive.com. You can also use a raw IP address for the host, such as http://198.162.0.10
Note that you can also pass in a user name and password to the setConnection
method, if you must use HTTP Authorization to establish your connection. For more about HTTP Authorization, please see the entry on the setAuthorization
method.
The HTTP specification allows for a basic (i.e., not particularly secure) type of authorization called HTTP Authorization. If the XML file that you require is protected by this sort of authentication, you can use the setAuthorization
method of DOMIT!.
setAuthorization
is used in conjunction with the setConnection
method, and requires that you provide a plain text username and password:
$cdCollection =& new DOMIT_Document(); $xmldoc->setConnection('http://www.engageinteractive.com', '/', '955'); $xmldoc->setAuthorization('johnheinstein', 'mypassword'); //call loadXML method as usual $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);
An HTTP proxy is a server that acts as an intermediary between an HTTP client (a user's browser) and the Internet. It is used to enforce security, administrative control, and caching services. If you are behind a firewall, for instance, and must connect to a proxy server to access web based resources, then the setProxyConnection
method will allow you to access such data.
The setProxyConnection
method works inn exactly the same way as setConnection
:
$cdCollection =& new DOMIT_Document(); $xmldoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060'); //call loadXML method as usual $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);
The setProxyAuthorization
is called in exactly the same way as setAuthorization
. Just provide a valid user name and password:
$cdCollection =& new DOMIT_Document(); $xmldoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060'); $xmldoc->setProxyAuthorization('johnheinstein', 'mypassword'); //call loadXML method as usual $success = $cdCollection->loadXML("http://www.engageinteractive.com/rssfeed.xml", false);
By default, when loading an XML document, DOMIT! removes what it considers insignificant whitespace -- such as the tabs between XML tags that are used for formatting purposes only.
Whitespace can be retained, however, if the following is called prior to loading or parsing:
$cdCollection->preserveWhitespace(true);
When DOMIT! parses or loads an XML Document, often entities are present which must be transformed into their corresponding character representations. Generally it is the responsibility of the DOCTYPE declaration to delineate these conversions.
However, DOMIT! is a non-validating parser, and is unaware of constraints placed on a document by the DOCTYPE.
The appendEntityTranslationTable
method is an alternate way of specifying character equivalents of entities.
It takes a single parameter -- an associative array of entities mapped to their equivalent characters. For example, if one wanted to instruct DOMIT! to convert all ©
entities into ©
:
//create translation table $myTranslationTable = array('©' => '©'); //pass table to document $cdCollection->appendEntityTranslationTable($myTranslationTable);
When DOMIT! parses XML from a string or loads XML from a file, several methods can be used to handle non-conformant XML and retrieve error codes.
DOMIT! also allows you to set a custom error handler for runtime XML processing errors.
If the resolveErrors
method is called, DOMIT! will attempt to locate and fix any problems with improperly formatted XML code. The method must be called before parsing begins; just pass it a value of true:
$cdCollection =& new DOMIT_Document(); $cdCollection->resolveErrors(true); $success = $cdCollection->loadXML("/xml/cdcollection.xml");
Note that resolveErrors
may have an impact on speed, and should be used judiciously.
Currently, resolveErrors
only searches for and replaces ampersands that have not been encoded as &
If loadXML
or parseXML
return false, an error has occurred in processing. The methods getErrorCode
and getErrorString
can be used to diagnose where the problem lies.
getErrorCode
returns a numerical description of the error, and getErrorString
returns a textual description of the error. For example:
$cdCollection =& new DOMIT_Document(); $cdCollection->resolveErrors(true); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //process XML } else { //an error has occurred; echo to browser echo "Error code: " . $cdCollection->getErrorCode(); echo "\n<br />"; echo "Error string: " . $cdCollection->getErrorString(); }
If you would like to set a custom error handler for DOMIT! to handle runtime XML processing errors, you can use a static method of the DOMIT_DOMException
class: setErrorHandler
.
It takes a single parameter -- the method to handle the error.
The custom errorhandler method must have the following method signature...
function myCustomErrorHandler($errorNum, $errorString)
...where $errorNum
is an integer signifying the number of the error, and $errorString
is a string giving a description of the error.
For example, if you wrote a function to handle your DOMIT! errors that looked like this:
function myErrorHandler($errorNum, $errorString) { echo "The error number is " . $errorNum . " and " the error string is " . $errorString; }
You could invoke it like this:
DOMIT_DOMException::setErrorHandler("myErrorHandler");
If the myErrorHandler
function was a method of a class named ErrorHandlers
rather than a standalone function, you could invoke setErrorHandler like this:
DOMIT_DOMException::setErrorHandler(array("ErrorHandlers", "myErrorHandler"));
The DOMIT_DOMException::setErrorMode
method allows you to define the behavior of DOMIT! when an exception occurs. It takes a single parameter -- an integer or interger constant representing the error mode:
DOMIT_ONERROR_CONTINUE (1) - specifies that DOMIT! should continue processing after an exception occurs. This is the default behavior.
DOMIT_ONERROR_DIE (2) - specifies that DOMIT! should die and display the error message after an exception occurs.
For example:
$cdCollection =& new DOMIT_Document(); //sets DOMIT! to die on an exception DOMIT_DOMException::setErrorMode(DOMIT_ONERROR_DIE);
The DOMIT_DOMException::setErrorLog
method allows you to specify a file to which error messages are logged and timestamped. This is a useful feature for debugging XML parsing problems.
It takes two parameters:
a boolean specifying whether logging should be turned on (true) or off (false)
a string containing the absolute or relative path of the error log file.
The following example specifies that errors are to be logged to the file 'errorLog.txt':
$cdCollection =& new DOMIT_Document(); //specifies that error logging is to be enabled and the error log filename DOMIT_DOMException::setErrorLog(true, 'errorLog.txt');
Once a DOMIT_Document has been populated, you can use the standard DOM methods to extract and manipulate data in the XML tree. The following chapter illustrates how this can be done.
You can acquire a reference to the document element node -- the root element in a DOM document -- using the documentElement
keyword.
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //gets a reference to the root element of the cd collection $myDocumentElement =& $cdCollection->documentElement; }
In the cd library example, the document element node is the node <cdlibrary>
.
Note: Always remember to use the reference (&) operator in PHP4, or you will be returned a shallow copy of the childNodes array. Even if you are using PHP5, it is recommended for the sake of portability to other web servers that you use an ampersand anyway.
A text representation of a node and its contents can be displayed using the
and toString
methods. The toNormalizedString
expandEmptyElementTags
method can be used to further tweak your output.
Take the document element node of the cd library example above. Once a reference to the <cdlibrary>
node has been obtained using the documentElement
keyword, we can see what it contains:
$myDocumentElement =& $cdCollection->documentElement; echo $myDocumentElement->toString(true);
The following string will be echoed to the browser window:
<cdlibrary><cd discid="bb0c3c0c"><name>Robbie Fulks</name><title>Couples in Trouble</title></cd><cd discid="9b0ce70c"><name>Richard Thompson</name><title>Mock Tudor</title></cd><cd discid="cf11720f"><name>Keller Williams</name><title>Laugh</title></cd></cdlibrary>
The first parameter of toString
, if set to true, converts special HTML characters into their encoded version (i.e. & into &) so that they will display properly in a browser.
If you would like unconverted raw text to be output (for instance, when echoing to a command line interface) substitute a value of false:
echo $myDocumentElement->toString(false);
One drawback of the toString
output is that it is not particularly readable, since all text of the node is compressed into one line. The toNormalizedString
method will output text that is much more nicely formatted:
$myDocumentElement =& $cdCollection->documentElement; echo $myDocumentElement->toNormalizedString(true);
The following string will be echoed to the browser window:
<cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
As with the toString
method, passing a value of false into toNormalizedString
outputs text that is not formatted for HTML display.
When outputting XML using toString
or toNormalized
string, by default DOMIT! represents empty elements using the abbreviated convention:
<anEmptyElement />
If you prefer the tags to be expanded instead, use the expandEmptyElementTags method:
$xmldoc->expandEmptyElementTags(true);
When using DOMIT! to render XHTML documents, often it is necessary to leave some tags unexpanded, such as the <br />
tag. The expandEmptyElementTags
method allows you to pass in an array of exceptions to the expansion rule:
//create array of exceptions to the empty element expansion rule $expansionExceptions = array('br', 'hr'); //invoke expansion rule, passing in array of exceptions as second parameter $xmldoc->expandEmptyElementTags(true, $expansionExceptions);
This might result in output that looked like this:
<html> <body> <p>This is a test</p> <p></p> <br /> </body> </html>
In an earlier section, we learned that each node in a DOM document has three properties -- node type, node name, and node value -- that allows you to distinguish between it and other nodes.
These properties are accessible in DOMIT! with the nodeType
, nodeName
, and nodeValue
keywords.
To echo out these properties for the document element of the cdlibrary example, for instance, you would do this:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //gets a reference to the root element of the cd collection $myDocumentElement =& $cdCollection->documentElement; //echo out node name echo "Node name: " . $myDocumentElement->nodeName; echo "\n<br />"; //echo out node type echo "Node type: " . $myDocumentElement->nodeType; echo "\n<br />"; //echo out node value echo "Node value: " . $myDocumentElement->nodeValue; echo "\n<br />"; }
The above example would display:
cdlibrary 1
Note that the last line is blank because the node value for an element is null.
You know how to:
instantiate a DOMIT! document
populate a DOMIT! document using the loadXML
or parseXML
methods
obtain a reference to the document element
print the contents of a node, and
display the three basic node properties
We will now learn how to access other parts of a document using such DOM constructs as child nodes, parent nodes, and next and previous siblings.
As explained previously, each node in a DOM Document has a list of references to the nodes contained directly beneath it in the tree: its Child Nodes.
In DOMIT!, the child nodes exist as a standard PHP array named
.childNodes
To grab a reference to the childNodes
array of a node, use the following syntax:
//get a reference to the childNodes collection of the document element $myChildNodes =& $cdCollection->documentElement->childNodes;
Note: When returning areference to the childNodes array in PHP4, always remember to use the reference (&) operator, or you will be returned a shallow copy.
It is good practice, prior to grabbing a reference to the childNodes
array, to use the hasChildNodes
method to check if any child nodes exist:
//ensure that there are childNodes before bothering to work with the childNodes array if ($cdCollection->documentElement->hasChildNodes()) { $myChildNodes =& $cdCollection->documentElement->childNodes; }
The number of child nodes is stored in the
property. You can use this value to traverse the childCount
childNodes
array and access its individual nodes:
//ensure that there are childNodes before bothering to work with the childNodes array if ($cdCollection->documentElement->hasChildNodes()) { //get a reference to the childNodes collection of the document element $myChildNodes =& $cdCollection->documentElement->childNodes; //get the total number of childNodes for the document element $numChildren =& $cdCollection->documentElement->childCount; //iterate through the collection for ($i = 0; $i < $numChildren; $i++) { //get a reference to the i childNode $currentNode =& myChildNodes[$i]; //echo out the node to browser echo ("Node $i contents are: \n<br />" . $currentNode->toNormalizedString(true) . "\n<br />\n<br />"); } }
The above example will return:
Node 1 contents are: <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> Node 2 contents are: <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> Node 3 contents are: <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd>
The
array is not the only means of accessing the children of a node.childNodes
The
property of a node returns a reference to a node's first child node:firstChild
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first child node of document element $firstChildNode =& $cdCollection->documentElement->firstChild; //echo out the node to browser echo ("The contents of the first child node are: \n<br />" . $firstChildNode->toNormalizedString(true)); } }
The above example will return:
The contents of the first child node are: <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd>
Note: If there are no child nodes present, a value of null is returned.
The
property of a node returns a reference to a node's last child node:lastChild
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to last child node $lastChildNode =& $cdCollection->documentElement->lastChild; //echo out the node to browser echo ("The contents of the last child node are: \n<br />" . $lastChildNode->toNormalizedString(true)); } }
The above example will return:
The contents of the last child node are: <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd>
If there are no child nodes present, a value of null is returned.
Nodes that occupy the same level of a DOM tree are called siblings. The DOM conceives of these nodes as being chained in a sequence, with each node aware of the node immediately preceding and immediately following it.
The
property of a node returns a reference to the node prior to it in the sibling chain.nextSibling
In the cdlibrary example, the next sibling of the Robbie Fulks <cd>
node is the Richard Thompson <cd>
node. One would access it like this:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //get a reference to the next sibling (the Richard Thompson cd) $nextSiblingNode =& $firstChildNode->nextSibling; //echo out the node to browser echo ("The contents of the next sibling are: \n<br />" . $nextSiblingNode->toNormalizedString(true)); } }
The above example will return:
The contents of the next sibling are: <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd>
If there are no next sibling nodes present, a value of null is returned.
The
property of a node returns a reference to the node after it in the sibling chain.lastSibling
In the cdlibrary example, the previous sibling of the Keller Williams <cd>
node is the Richard Thompson <cd>
node. One would access it like this:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to last cd node (the Keller Williams cd) $lastChildNode =& $cdCollection->documentElement->lastChild; //get a reference to the previous sibling (the Richard Thompson cd) $previousSiblingNode =& $lastChildNode->previousSibling; //echo out the node to browser echo ("The contents of the previous sibling are: \n<br />" . $previousSiblingNode->toNormalizedString(true)); } }
The above example will return:
The contents of the previous sibling are: <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd>
If there are no previous sibling nodes present, a value of null is returned.
As the name implies, the
property of a node returns a reference to the node one level above it in the DOM tree.parentNode
In the cdlibrary example, the parent node of the Robbie Fulks <cd>
node is the document element <cdlibrary>
node. One would access it like this:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //get a reference to the parent (cdlibrary) node $myParentNode =& $firstChildNode->parentNode; //echo out the node to browser echo ("The contents of the parent node of the Robbie Fulks cd node are: \n<br />" . $myParentNode->toNormalizedString(true)); } }
The above example will return:
The contents of the parent node of the Robbie Fulks cd node are: <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
If there is no parent node present, a value of null is returned. Note that only the document element node will have no parent.
Each node in a DOM document -- with the the exception of attribute nodes -- is considered to be "owned" by that document.
Use the ownerDocument
property of a node to obtain a reference to the DOMIT! document:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //get a reference to the DOMIT document $myOwnerDocument =& $firstChildNode->ownerDocument; }
Text nodes, CDATA Section nodes, and comment nodes belong to what is defined by the DOM as the CharacterData interface, which specifies a number of methods for obtaining the textual data. The following section describes some of these methods.
The easiest way of getting the data from a text node, CDATA Section nodes, or comment node is through its nodeValue
property.
Note: A common error that many DOM newbies make is to confuse a text node with the element node that contains it. It is important to realize that a text node is always the child of the containing element.
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //get a reference to the <name> element of the Robbie Fulks cd $nameElement =& $cdCollection->documentElement->childNodes[0]->firstChild; //get a reference to the text node //(this step has been broken into multiple steps to emphasize that //a text node must be distinguished from its containing element!) $nameTextNode =& $nameElement->firstChild; //echo out the data in the text node echo $nameTextNode->nodeName; }
The above example returns:
Robbie Fulks
If you prefer, you can condense the above steps into a single line:
$myText = $cdCollection->documentElement->childNodes[0]->firstChild->firstChild->nodeName;
The getData
method is a wrapper for the nodeValue
keyword and functions in exactly the same way:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //get a reference to the <name> element of the Robbie Fulks cd $nameElement =& $cdCollection->documentElement->childNodes[0]->firstChild; //get a reference to the text node //(this step has been broken into multiple steps to emphasize that //a text node must be distinguished from its containing element!) $nameTextNode =& $nameElement->firstChild; //echo out the data in the text node echo $nameTextNode->getData(); }
In most cases, the getText
method functions identically to nodeValue
and getData
. You can simply substitute the word getText for the word getData in the previous example and the results will be the same.
However, getText
can also be called on an element. In this case, the concatenated text of all children beneath the element is returned. For instance:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //get a reference to the Robbie Fulks <cd> element $cdElement =& $cdCollection->documentElement->childNodes[0]; //get ALL text beneath the cd element ("name" text + "title" text) $childText = $cdElement->getText(); //echo out the concatenated data echo childText; }
The above example returns:
Robbie FulksCouples in Trouble
The getLength
method indicates how many characters exist in a character data node:
$numCharacters = $myTexNode->getLength();
The substringData method returns a specified subset of characters from a character data node.
It takes two parameters:
offset: an integer specifying the starting character of the substring
count: an integer specifying how many characters from the offset should be included in the substring
To extract the first name from the "Robbie Fulks" text node, for example, one would do this:
$firstName = $rfTextNode->substringData(0,6);
In a DOM document, attributes are accessed, by name, from their containing element. The DOMIT! methods hasAttribute
and getAttribute
can be used to extract attribute data.
To determine whether an element contains a particular attribute, you can use the hasAttribute
method. It takes a single string parameter -- the name of the attribute -- and returns either true or false:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //determine whether it has an attribute named "discid" if ($firstChildNode->hasAttribute("discid")) { echo ("I DO have a discid attribute"); } else { echo ("I DO NOT have a discid attribute"); } }
The hasAttributes
method returns true if an element contains at least one attribute.
if ($someNode->hasAttributes()) { echo ("I have at least one attribute"); } else { echo ("I have no attributes"); }
To obtain the value of a named attribute, use the getAttribute
method. As with the hasAttribute
method, you pass in the attribute name:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //determine whether it has an attribute named "discid" if ($firstChildNode->hasAttribute("discid")) { //obtain the value of the discid attribute $attrValue = $firstChildNode->getAttribute("discid); //echo the value out to the browser echo ("Attribute value: " . $attrValue); } else { echo ("I DO NOT have a discid attribute"); } }
The above example returns:
bb0c3c0c
Note: If the attribute does not exist, an empty string (i.e., "") is returned.
The getAttribute method returns the value of an attribute node. If you would like to obtain a reference to the node itself, use the getAttributeNode
method.
To obtain the value of an attribute node, use either the getValue
method:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //determine whether it has an attribute named "discid" if ($firstChildNode->hasAttribute("discid")) { //obtain a reference to the discid attribute node (don't forget the ampersand!) $attrNode =& $firstChildNode->getAttributeNode("discid); //echo the value out to the browser echo ("The value of the discid attribute is: \n<br />" . $attrNode->getValue()); } else { echo ("I DO NOT have a discid attribute"); } }
The above example returns:
The value of the discid attribute is: bb0c3c0c
An attribute list is defined by the DOM specification as a Named Node Map. This is a type of node collection that allows you to access its members either by name or by index.
Although the attribute specific methods are in most cases sufficient, there may be times when you do not know in advance the names of an elements attributes. Using the named node map methods, you can query ther list to find out this data.
To obtain a reference to the attributes list /named node map of an element, use the attributes
keyword:
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //get a reference to the attributes list / named node map (don't forget the ampersand!) $attrList =& $firstChildNode->attributes; }
The getLength
method of a named node map returns an integer indicating how many members belong to the attribute list.
The item
method of a named node map allows you to access a member by its numerical index (which is 0-based). In combination with the getLength
method, you can set up a loop through the members of an attribute list.
The getName
method will tell you the name of the node.
if ($cdCollection->documentElement->hasChildNodes()) { //get reference to first cd node (the Robbie Fulks cd) $firstChildNode =& $cdCollection->documentElement->firstChild; //get a reference to the attributes list / named node map (don't forget the ampersand!) $attrList =& $firstChildNode->attributes; //determine the number of members in the attribute list $numAttributes = $attrList->getLength(); //iterate through the list for ($i = 0; $i < $numAttributes; $i++) { //get a reference to the attribute node at index i (don't forget the ampersand!) $currAttr =& $attrList->item(i); //echo out the name and value of the attribute echo "The attribute at index " . i . " is named: " . $currAttr->getName(); echo "\n<br /> Its value is: " . $currAttr->getValue(); } }
The above example returns:
The attribute at index 1 is named: discid Its value is: bb0c3c0c
The XML Prolog is a term referring to the XML Declaration and the Document Type Declaration.
The XML declaration can be acquired with the getXMLDeclaration
method:
$myXMLDecl =& $xmldoc->getXMLDeclaration();
A reference to a processing instruction node is returned.
The major strength of the Document Object Model is the ease with which the data in an XML document can be modified. The following chapter delineates how to use DOMIT! for creating, appending, inserting, replacing, removing, and altering XML data.
Creating new XML nodes is accomplished using a set of DOMIT_Document factory methods. For the next subsections, we will assume that a new DOMIT_Document has already been created as follows:
//include DOMIT! codebase require_once('xml_domit_include.php'); //instantiate a new DOMIT! document $xmldoc =& new DOMIT_Document();
To create a new DOM element, use the createElement
method.
The createElement
method takes a single parameter -- the name of the element.
$newElement =& $xmldoc->createElement("cdlibrary");
Note: Don't forget to include the ampersand for backwards compatibility with PHP4!
To create a new DOM text node, use the createTextNode
method.
The createTextNode
method takes a single parameter -- the text of the node.
$myText = 'Here is some dummy text'; $newTextNode =& $xmldoc->createTextNode($myText);
To create a new DOM CDATA Section, use the createCDATASection
method.
The createCDATASection
method takes a single parameter -- the text of the CDATA Section.
$myText = 'Here are some illegal XML characters: & <'; $newCDATASection =& $xmldoc->createCDATASection($myText);
To create a new DOM attribute, use the createAttribute
method.
The createAttribute
method takes two parameters:
the name of the attribute
the value of the attribute
$newAttribute =& $xmldoc->createAttribute("discid", "bb0c3c0c");
To create a new DOM comment, use the createComment
method.
The createComment
method takes a single parameter -- the text of the comment.
$myCommentText = 'This is a comment'; $newCommentNode =& $xmldoc->createComment($myCommentText);
To create a new DOM processing instruction, use the createProcessingInstruction
method.
The createProcessingInstruction
method takes a two parameters -- the text of the target and the text of the data.
//create target and data $myTarget = 'xml'; $myData = 'version="1.0"'; //create processing instruction $newProcessingInstructionNode =& $xmldoc->createProcessingInstruction($myTarget, $myData);
Appending a node in the DOM means adding a new child node to the end of a node's child nodes list.
You can use the appendChild
method to append a node (and its children, if any exist) to a DOM Document or an element node.
The following example creates a <cdlibrary>
element and appends it to a new DOMIT_Document:
//include DOMIT! codebase require_once('xml_domit_include.php'); //instantiate a new DOMIT! document $xmldoc =& new DOMIT_Document(); //create cdlibrary node $newNode =& $xmldoc->createElement('cdlibrary'); //append cdlibrary node to new DOMIT_Document $xmldoc->appendChild($newNode); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<cdlibrary></cdlibrary>
In the previous section, when the <cdlibrary> element was appended to the empty DOM document, it became the document element.
The setDocumentElement
method is another way of achieving the same result. For example:
//include DOMIT! codebase require_once('xml_domit_include.php'); //instantiate a new DOMIT! document $xmldoc =& new DOMIT_Document(); //create cdlibrary node $newNode =& $xmldoc->createElement('cdlibrary'); //append cdlibrary node to new DOMIT_Document $xmldoc->setDocumentElement($newNode); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<cdlibrary></cdlibrary>
setDocumentElement
will overwrite an existing document element.
The setAttribute
and setAttributeNode
methods are used to either add an attribute to an element, or change the value of an existing attribute. They are methods of element nodes only.
The setAttribute
method takes two parameters:
the name of the attribute to be added
the value of the attribute to be appended
The following example adds a discid
attribute to a <cd>
element:
//create cd element $newNode =& $xmldoc->createElement('cd'); //add a discid attribute $newNode->setAttribute('discid', 'bb0c3c0c'); //echo to browser echo $newNode->toNormalizedString(true);
The result is:
<cd discid="bb0c3c0c"></cd>
The setAttribute
method also adds an attribute to an element. It takes a single parameter -- an attribute node:
//create cd element $newNode =& $xmldoc->createElement('cd'); //create a discid attribute node $newAttr =& $xmldoc->createAttribute('discid', 'bb0c3c0c'); //add the attribute node to the element $newNode->setAttributeNode($newAttr); //echo to browser echo $newNode->toNormalizedString(true);
The result is:
<cd discid="bb0c3c0c"></cd>
We now have sufficient tools to create the cdlibrary example from scratch, using only DOMIT!
//include DOMIT! codebase require_once('xml_domit_include.php'); //instantiate a new DOMIT! document $xmldoc =& new DOMIT_Document(); //create XML declaration $xmlDecl =& $xmldoc->createProcessingInstruction('xml', 'version="1.0"'); //append XML declaration to new DOMIT_Document $xmldoc->appendChild($xmlDecl); //create cdlibrary node $rootElement =& $xmldoc->createElement('cdlibrary'); //append cdlibrary node to new DOMIT_Document $xmldoc->appendChild($rootElement); //CREATE FIRST CD ELEMENT AND CHILDREN //create cd element $cdElement_1 =& $xmldoc->createElement('cd'); //add discid attribute $cdElement_1->setAttribute('discid', 'bb0c3c0c'); //create name element $nameElement =& $xmldoc->createElement('name'); //create and append text node to name element $nameElement->appendChild($xmldoc->createTextNode('Robbie Fulks')); //append name element to cd element $cdElement_1->appendChild($nameElement); //create title element $titleElement =& $xmldoc->createElement('title'); //create and append text node to title element $titleElement->appendChild($xmldoc->createTextNode('Couples in Trouble')); //append title element to cd element $cdElement_1->appendChild($titleElement); //CREATE SECOND CD ELEMENT AND CHILDREN //create cd element $cdElement_2 =& $xmldoc->createElement('cd'); //add discid attribute $cdElement_2->setAttribute('discid', '9b0ce70c'); //create name element $nameElement =& $xmldoc->createElement('name'); //create and append text node to name element $nameElement->appendChild($xmldoc->createTextNode('Richard Thompson')); //append name element to cd element $cdElement_2->appendChild($nameElement); //create title element $titleElement =& $xmldoc->createElement('title'); //create and append text node to title element $titleElement->appendChild($xmldoc->createTextNode('Mock Tudor')); //append title element to cd element $cdElement_2->appendChild($titleElement); //CREATE THIRD CD ELEMENT AND CHILDREN //create cd element $cdElement_3 =& $xmldoc->createElement('cd'); //add discid attribute $cdElement_3->setAttribute('discid', 'cf11720f'); //create name element $nameElement =& $xmldoc->createElement('name'); //create and append text node to name element $nameElement->appendChild($xmldoc->createTextNode('Keller Williams')); //append name element to cd element $cdElement_3->appendChild($nameElement); //create title element $titleElement =& $xmldoc->createElement('title'); //create and append text node to title element $titleElement->appendChild($xmldoc->createTextNode('Laugh')); //append title element to cd element $cdElement_3->appendChild($titleElement); //APPEND CD ELEMENTS TO CDLIBARY ELEMENT $rootElement->appendChild($cdElement_1); $rootElement->appendChild($cdElement_2); $rootElement->appendChild($cdElement_3); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
If you need to add a child node somewhere other than the end of the child nodes list, you can use the insertBefore
method.
insertBefore
takes two parameters:
a reference to the node that is to be added
a reference to an existing child node, before which the insertion will occur
If, continuing with the cdlibrary document from the previous example, we wished to insert a comment as the first child node of the <cdlibrary>
element, insertBefore
could be used:
//create a comment $myComment =& $xmldoc->createComment('Not many cds left after I got robbed'); //insert the comment as the first child of the cdlibrary element $rootElement->insertBefore($myComment, $rootElement->childNodes[0]); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <!--Not many cds left after I got robbed--> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
Let's say that I traded my Robbie Fulks cd for a Charlie Hunter cd named "Songs From the Analog Playground", and I want to replace the old XML with a new cd node.
The replaceChild
method can be used to do this. It takes two parameters:
a reference to the new node to be added
a reference to the node that is to be replaced
//CREATE NEW CHARLIE HUNTER CD ELEMENT AND CHILDREN //create cd element $cdElement_new =& $xmldoc->createElement('cd'); //add discid attribute $cdElement_new->setAttribute('discid', 'a30e4c0d'); //create name element $nameElement =& $xmldoc->createElement('name'); //create and append text node to name element $nameElement->appendChild($xmldoc->createTextNode('Charlie Hunter')); //append name element to cd element $cdElement_new->appendChild($nameElement); //create title element $titleElement =& $xmldoc->createElement('title'); //create and append text node to title element $titleElement->appendChild($xmldoc->createTextNode('Songs From the Analog Playground')); //append title element to cd element $cdElement_new->appendChild($titleElement); //REPLACE ROBIBIE FULKS CD NODE WITH CHARLIE HUNTER CD NODE //(remember a comment has been added, so Robbie is the second child node) $rootElement->replaceChild($cdElement_new, $rootElement->childNodes[1]); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <!--Not many cds left after I got robbed--> <cd discid="a30e4c0d"> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
The removeChild
method allows you to delete a node (and its children) from a DOM document. It takes a single parameter -- a reference to the node to be removed.
The following example removes the comment from the cdlibrary XML:
$rootElement->removeChild($rootElement->firstChild); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="a30e4c0d"> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
An attribute can be deleted with either the removeAttribute
or removeAttributeNode
method.
An attribute can be removed with the removeAttribute
method. It takes a single parameter -- the name of the attribute to be removed.
The following example removes the discid
attribute from the Charlie Hunter <cd>
element:
$rootElement->firstChild->removeAttribute('discid'); //echo to browser echo $rootElement->toNormalizedString(true);
The result is:
<cdlibrary> <cd> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
An attribute can also be removed with the removeAttributeNode
method. It takes a single parameter -- a reference to the attribute to be removed.
The following example removes the discid
attribute from the Charlie Hunter <cd>
element:
//get reference to attribute to be removed $attrToRemove =& $rootElement->firstChild->getAttributeNode('discid'); //remove attribute $rootElement->firstChild->removeAttributeNode($attrToRemove); //echo to browser echo $rootElement->toNormalizedString(true);
The result is:
<cdlibrary> <cd> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
There are a variety of methods available for working with character data nodes.
The setText
method allows you to modify the text of an existing text node, CDATA Section, or comment.
To change the title of the Keller Williams cd, for example, you would do this:
//get reference to title text node of Keller Williams cd $titleTextNode =& $rootElement->childNodes[2]->childNodes[1]->firstChild; //modify title $titleTextNode->setText('Loop'); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="a30e4c0d"> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Loop</title> </cd> </cdlibrary>
If setText
is called from an element instead of a text node, DOMIT! will check if the element has a child text node.
If the element has a child text node, the text of that node will be set to the value specified in the setText
parameter.
If the element does not have a child text node, a new text node will be created, appended to the element, and its node value set to the value specified in the setText
parameter. For instance:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //call setText on the element //(note that no child text node exists at this point, but one will be created) $someElement->setText('Some sample text'); //echo to browser echo $someElement->toNormalizedString(true);
The result is:
<someElement>Some sample text</someElement>
The splitText
method is accessible only from a text node. It allows you to split a text node into two text nodes, at a specified offset point. Both text nodes will be retained in the DOM tree as siblings.
setText
takes a single integer parameter -- the character index at which the node is to be split.
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //echo childCount to browser echo '$someElement has ' . $someElement->childCount . ' child nodes.'; /add a text node to the element $someElement->firstChild->splitText(5); //echo childCount to browser echo "\n<br />"; echo 'After calling splitText, $someElement now has ' . $someElement->childCount . ' child nodes.';
The result is:
$someElement has 1 child nodes. After calling splitText, $someElement now has 2 child nodes.
The normalize
method performs the opposite of the splitText
method: it collapses adjacent text nodes into a single text node.
normalize
can be called from any element or the DOM document itself, and is called recursively on all nodes below the calling node.
The following example splits a text node using splitNode
, then uses normalize
to reverse the operation:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //echo childCount to browser echo '$someElement has ' . $someElement->childCount . ' child nodes.'; /add a text node to the element $someElement->firstChild->splitText(5); //echo childCount to browser echo "\n<br />"; echo 'After calling splitText, $someElement now has ' . $someElement->childCount . ' child nodes.'; //call normalize on element to reverse splitText $someElement->normalize(); //echo childCount to browser echo "\n<br />"; echo 'After calling normalize, $someElement now has ' . $someElement->childCount . ' child nodes.';
The result is:
$someElement has 1 child nodes. After calling splitText, $someElement now has 2 child nodes. After calling normalize, $someElement now has 1 child nodes.
The appendData
method allows you to append text to a text node, CDATA Section, or comment node. For example:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //append more text $someElement->firstChild->appendData(' plus more text.'); //echo to browser echo $someElement->toNormalizedString(true);
The result is:
<someElement>Some sample text plus more text.</someElement>
The insertData
method allows you to insert text into a text node, CDATA Section, or comment node, as a specified offset.
It takes two parameters: an integer indicating the insertion pont, and a string comprising the text to be inserted. For example:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //insert some text $someElement->firstChild->insertData(5, ' more'); //echo to browser echo $someElement->toNormalizedString(true);
The result is:
<someElement>Some more sample text</someElement>
The replaceData
method allows you to overwrite a substring of text in a text node, CDATA Section, or comment node.
It takes three parameters: an integer indicating the insertion pont, an integer specifying the number of characters from the insertion point to overwrite, and a string comprising the replacement text. For example:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //replace some text $someElement->firstChild->replaceData(0, 4, 'A bit of'); //echo to browser echo $someElement->toNormalizedString(true);
The result is:
<someElement>A bit of sample text</someElement>
The deleteData
method allows you to delete a substring of text in a text node, CDATA Section, or comment node.
It takes two parameters: an integer indicating the insertion pont, and an integer specifying the number of characters from the insertion point to delete. For example:
//create a new element $someElement =& $xmldoc->createElement('someElement'); //add a text node to the element $someElement->setText('Some sample text'); //delete some text $someElement->firstChild->deleteData(6, 7); //echo to browser echo $someElement->toNormalizedString(true);
The result is:
<someElement>Some text</someElement>
After modifying an XML document, you generally need to save it to the filesystem. This can be achieved using the saveXML
method.
saveXML takes two parameters:
the file path to save the document
a boolean specifying whether toNormalizedString
formatting should be applied to the saved XML
To save the cdlibrary XML you would do this:
$xmldoc->saveXML('/xml/cdcollection.xml', true)
The are a number of additional DOM methods and constructs that have not yet been covered in this tutorial. The following chapter illustrates these.
The cloneNode method allows you to make an copy of a node and its children. All data in the cloned node will be identical to its source node, but the nodes are considered separate objects.
cloneNode
takes a single parameter -- a boolean that if set to true will also clone all children of the node. The default value is true.
Any type of node in a DOM document can be cloned.
The following example clones the first <cd>
element in the cdlibrary document and prints to the browser:
//get reference to first cd node $firstCDNode =& $cdCollection->documentElement->childNodes[1]; //echo to browser echo $firstCDNode->toNormalizedString(true); //clone first cd node $clonedCDNode =& $firstCDNode->cloneNode(true); //echo to browser echo "\n<br />\n<br />" . $clonedCDNode->toNormalizedString(true);
The result is:
<cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd>
The getElementByID
method searches for elements with attributes of type ID, and returns an element with the specified value if one exists.
The DOM specification explains that by default, the search does not match on elements with an attribute named "ID"; rather, it is an attribute type that the method is looking for. The attribute type must either be:
defined in the document type declaration, i.e.,
<!ATTLIST bar id ID #IMPLIED >
an attribute named id must be prefixed with the namespace xml
<someElement xml:id="12345" />
DOMIT! is a non-validating parser, so the first option is not available. DOMIT! does, however, recognize the second option. With the following xml document, for example...
<testDocument> <someElement xml:id="12345">The containing element is properly formatted for getElementByID</someElement> <anotherElement id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement> </testDocument>
... the getElementByID
method will match only on the first child node:
//instantiate and load XML $xmldoc =& new DOMIT_Document(); $success = $xmldoc->loadXML("testDocument.xml", true); if ($success) { //search for element with an ID of "12345" $matchingNode =& $xmldoc->getElementByID("12345"); //echo matching node to browser if one exists if ($matchingNode != null) { echo $matchingNode->toNormalizedString(true); } }
The result is:
<someElement xml:id="12345">The containing element is properly formatted for getElementByID</someElement>
The getElementByID
method returns null if no matching element is found.
Some may argue that the DOM specification for getElementByID
is too rigid for practical purposes. When parsing XHTML in particular, it is common to match on ID attributes that are not defined in such a way that DOMIT! or other non-validating parsers can effectively match on elements.
Given this, DOMIT! allows you to specify a tolerant mode for getElementByID
searches. By passing in a second parameter of false, DOMIT! will match on elements with attributes of "ID" and "id".
Take the following document, for example:
<testDocument> <anotherElement id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement> </testDocument>
If getElementByID
is called:
//instantiate and load XML $xmldoc =& new DOMIT_Document(); $success = $xmldoc->loadXML("testDocument.xml", true); if ($success) { //search for element with an ID of "12345" $matchingNode =& $xmldoc->getElementByID("12345"); //echo matching node to browser if one exists if ($matchingNode != null) { echo $matchingNode->toNormalizedString(true); } }
The result is:
<anotherElement xml:id="12345">The containing element is NOT properly formatted for getElementByID</anotherElement>
The getElementsByTagName
method is similar to getElementByID
, in that it is a method for searching a DOM document for elements which match certain criteria.
In the case of getElementByTagName
, the name of the element is matched on, and there can consequently be multiple matching elements.
getElementsByTagName
takes a single parameter -- the tag name of the elements to match. The search is performed recursively through the entire subtree of the calling element.
If one searched the cdlibrary XML for elements named "cd", for example, three elements would be returned:
$cdCollection =& new DOMIT_Document(); $success = $cdCollection->loadXML("/xml/cdcollection.xml"); if ($success) { //use getElementsByTagName to gather all elements named "cd" $matchingNodes =& $cdCollection->getElementsByTagName("cd"); //if any matching nodes are found, echo to browser if ($matchingNodes != null) { echo $matchingNodes->toNormalizedString(true); } }
The result is a printout of the three matched cd elements:
<cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd>
In the previous section, the getElementsByTagName
returned a collection of matching nodes. This collection is described by the DOM specification as a Node List.
A node list is a collection of nodes accessible by numerical index. A number of methods are defined to access its members. Many of these are identical to those found in the previously discessed named node map.
The getLength
and item methods
for a node list are identical to those for a named node map. You can use them to iterate through the node list using a for loop.
Take the previous getElementsByTagName
example, which returned three nodes. You can, for instance, loop through the node list and print out the discid
of each CD:
//use getElementsByTagName to gather all elements named "cd" $matchingNodes =& $cdCollection->getElementsByTagName("cd"); //if any matching nodes are found, loop through them and print out disc id if ($matchingNodes != null) { //get total number of nodes in the list $total = $matchingNodes->getLength(); //loop through node list for ($i = 0; $i < $total; $i++) { //get current node on list $currNode =& $matchingNodes->item($i); //echo out discid echo $currNode->getAttribute('discid') . "\n<br />"; } }
The result is:
bb0c3c0c 9b0ce70c cf11720f
The appendNode
method allows you to add a node to the end of the node list. The removeNode
method allows you to remove a node from the node list.
Both methods take a single parameter -- a reference to the node being appended or removed.
To append a node to the cd node list from the above example, you could do this:
//use getElementsByTagName to gather all elements named "cd" $matchingNodes =& $cdCollection->getElementsByTagName("cd"); //create a new node $newNode =& $cdCollection->createElement("someElement"); //append the node to the node list $matchingNodes->appendNode($newNode); //echo to browser echo $matchingNodes->toNormalizedString(true);
The result is:
<cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> <someElement />
To remove a node from the cd node list from the above example, you could do this:
//use getElementsByTagName to gather all elements named "cd" $matchingNodes =& $cdCollection->getElementsByTagName("cd"); //remove the first node from the node list $matchingNodes->removeNode($matchingNodes->item(0)); //echo to browser echo $matchingNodes->toNormalizedString(true);
The result is:
<cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd>
According to the DOM specification, the child nodes of an element should be kept in a node list.
However, contrary to the specification, DOMIT! uses an array rather than a node list. This is to get around a deficiency in PHP 4, in which method calls cannot be chained together as one would normally expect with an object oriented programming language.
You cannot, for instance, do this in PHP4 (although you can in PHP5)...
$myText = $xmldoc->documentElement->getChildNodes()->item(2)->getText();
...although by using an array, it is possible to burrow deeply down into a document structure without splitting your code into multiple lines:
$myText = $xmldoc->documentElement->childNodes[2]->getText();
For those who are using PHP5 and would like child nodes to be returned in node list format, the childNodesAsNodeList
method can be used:
$myText = $documentElement->childNodesAsNodeList()->item(2)->getText();
The importNode
method allows you to properly import a node into a DOM document which originated from another DOM document.
It takes two parameters:
the node to be imported
a boolean that, if true, will also import all the nodes children (this is default behavior)
Let's say we had two XML document. The first is the cd collection that we have been using throughout this tutorial. The second document contains a single cd that looks like this:
<cd discid="a30e4c0d"> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd>
If we instantiated these two XML documents, and wanted to add the contents of the cd
document to the cdlibrary
, we would first have to use importNode
:
//instantiate and load first XML Document $xmldoc1 =& new DOMIT_Document(); $success1 = $xmldoc->loadXML("cdCollection.xml", true); //instantiate and load second XML Document $xmldoc2 =& new DOMIT_Document(); $success2 = $xmldoc->loadXML("cd.xml", true); //import contents of xmldoc2 into xmldoc1 $importedData =& $xmldoc1->importNode($xmldoc2->documentElement); //append contents of xmldoc2 to the cdCollection node $xmldoc1->documentElement->appendChild($importedData); //echo to browser echo $xmldoc1->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> <cd discid="a30e4c0d"> <name>Charlie Hunter</name> <title>Songs From the Analog Playground</title> </cd> </cdlibrary>
DOMIT! includes a number of non-DOM methods for XML processing.
The getVersion
method returns the version number of the current install of DOMIT!
$myVersion = $xmldoc->getVersion();
Although the getElementByID
and getElementsByTagName
methods are useful, often you need more sophisticated search options to simplify you XML code.
The getElementsByPath
method allows you to search for elements in a document that match a "path"-like pattern that you provide.
The syntax is similar to an XPath query, although the range of patterns allowed by getElementsByPath
is far less sophisticated than the XPath specification permits.
The pattern takes the basic form of elementName/elementName, where the forward slash represents a parent-child relationship. Either a node list, a single node, or null is returned
getElementsByPath
can be called by any node. There are three basic ways that you can form a pattern:
An absolute path search can be performed by prefixing your pattern with the / character. This type of search will start at the level of the document element node.
A relative path search can be performed by omitting the / prefix from your pattern. This type of search will start at the level of the node which called getElementsByPath.
A variable path search can be performed by prefixing your pattern with // characters. This type of search will find all matching elements, regardless of their position in the node hierarchy.
Let's try an example of each with our cdlibrary XML:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
The pattern for an absolute path search begins with a forward slash, meaning that the search will begin at the level of the document element node -- no matter what level the calling node resides.
To perform an absolute search for all <title>
elements, one would do this:
//use getElementsByPath to retrieve all title elements $myNodeList =& $cdCollection->getElementsByPath("/cdlibrary/cd/title"); //echo to browser echo $myNodeList->toNormalizedString(true);
The result is a listing of the three found <title>
nodes:
<title>Couples in Trouble</title> <title>Mock Tudor</title> <title>Laugh</title>
The pattern for a relative path search does not contain a beginning forward slash. The search will begin at the level of the calling node.
To perform an relative search for all <name>
elements which are children of <cd>
elements which are children of the <cdlibrary>
element, one would do this:
//use getElementsByPath to retrieve all name elements which are children of //cd elements which are children of the cdlibrary element $myNodeList =& $cdCollection->documentElement->getElementsByPath("cd/name"); //echo to browser echo $myNodeList->toNormalizedString(true);
The result is a listing of the three found <name>
nodes:
<name>Robbie Fulks</name> <name>Richard Thompson</name> <name>Keller Williams</name>
The pattern for a variable path search begins with a double forward slash. Each element in the document is be considered a starting point for the search.
To perform a variable search for all <title>
elements in the document, one would do this:
//use getElementsByPath to retrieve all title elements in cdlibrary $myNodeList =& $cdCollection->getElementsByPath("//title"); //echo to browser echo $myNodeList->toNormalizedString(true);
The result is a listing of the three found <title>
nodes:
<title>Couples in Trouble</title> <title>Mock Tudor</title> <title>Laugh</title>
If you would like a single node to be returned by getElementsByPath
, rather than the entire node list of matching elements, you can specify the index of the requested node by passing an integer as the second parameter of getElementsByPath
.
In accordance with the XPath specification, the index that you specify is 1-based.
To return the first <cd>
node of the cdlibrary example, you could do this:
//use getElementsByPath to retrieve the first cd element in cdlibrary $myElement =& $cdCollection->getElementsByPath("/cdlibrary/cd", 1); //echo to browser if ($myElement != null) { echo $myElement->toNormalizedString(true); }
The result is:
<cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd>
The getElementsByAttribute
method allows you to retrieve a node list of elements, each of which contain an attribute that matches the specified name and value. This is a useful improvement over the getElementByID
method, since it does not require you to be bound to a narrow definition of attribute type or name.
To obtain a node list of all elements containing an attribute named 'myAttr' and a value of '3', for example, you would do this:
//get node list of elements containing myAttr="3" $myNodeList =& $xmldoc->getElementsByAttribute('myAttr', '3');
There is a third parameter available for getElementsByAttribute
, a boolean which if set to true will return the first matching element rather than an entire node list of elements:
//get first matching elements containing myAttr="3" $myElement =& $xmldoc->getElementsByAttribute('myAttr', '3', true);
The getNodesByNodeType
method allows you to search the document tree for nodes of a specific node type.
You can specify a node type using one of the following DOMIT! constants:
DOMIT_ELEMENT_NODE
(an integer value of 1)
DOMIT_TEXT_NODE
(an integer value of 3)
DOMIT_CDATA_SECTION_NODE
(an integer value of 4)
DOMIT_PROCESSING_INSTRUCTION_NODE
(an integer value of 7)
DOMIT_COMMENT_NODE
(an integer value of 8)
DOMIT_DOCUMENT_NODE
(an integer value of 9)
You must also pass in as the second parameter a context node - a node from which the search should start.
The following example returns a node list of all text nodes in the cdlibrary example:
//find all text nodes in cdlibrary $myTextNodeList =& $cdCollection->getNodesByNodeType(DOMIT_TEXT_NODE, $cdCollection); //echo to browser echo $myTextNodeList->toNormalizedString(true);
The result is:
Robbie Fulks Couples in Trouble Richard Thompson Mock Tudor Keller Williams Laugh
The getNodesByNodeValue
method allows you to search the document tree for nodes of a specific node value.
This is especially useful for finding text or CDATA Section nodes containing a certain text value.
You must pass in the node value that you are searching for as well as a context node - a node from which the search should start.
The following example returns a node list of all nodes in the current document with a node value of "Robbie Fulks":
//find all text nodes with a value of "Robbie Fulks" in cdlibrary $myTextNodeList =& $cdCollection->getNodesByNodeValue("Robbie Fulks", $cdCollection); //get first match $firstItem =& $myTextNodeList->item(0); //echo parent node to browser echo $firstItem->parentNode->toNormalizedString(true);
The result is:
<name>Robbie Fulks</name>
Sometimes it is useful to convert an XML document into a PHP array, or to import a PHP array as an XML document.
DOMIT! provides two methods to accomplish this: toArray
, and DOMIT_Utilities::fromArray
.
Note: It may be faster to use the PHP/Expat method xml_parse_into_struct instead of the DOMIT! array methods when converting XML to arrays.
The toArray
method converts an xml node and its children to an array.
To convert the first <cd>
element of the cdlibrary example to an array, you would do this:
//convert first <cd> element to array $myArray =& $cdCollection->documentElement->firstChild->toArray(); //echo to browser print "<pre>"; print_r($myArray); print "</pre>";
The result is:
Array ( [cd] => Array ( [attributes] => Array ( [discid] => bb0c3c0c ) [0] => Array ( [name] => Array ( [attributes] => Array ( ) [0] => Robbie Fulks ) ) [1] => Array ( [title] => Array ( [attributes] => Array ( ) [0] => Couples in Trouble ) ) ) )
The DOMIT_Utilities::fromArray
method generates a node tree from an array and appends it to the specified document or node.
The convention follows that of the fromArray method in the minixml library:
//Create an array to represent a person Bob $bob = array( 'name' => array( 'first' => 'Bob', 'last' => 'Roberts' ), 'age' => 35, 'email' => 'bob@example.com', 'location' => array( 'streetaddr' => '123 Skid Row', 'city' => 'Dark City', 'state' => 'DN', 'country' => 'XNE', ), ); //Create another array to represent a person Mary $mary = array( 'name' => array( 'first' => 'Mary', 'last' => 'Zlipsakis' ), 'age' => 94, 'location' => array( 'streetaddr'=> '54343 Park Ave', 'city' => 'SmallVille', 'state' => 'DN', 'country' => 'XNE', ), 'icecream' => 'vanilla', ); //Create a big array that contains all our people $xmlArray = array(); $xmlArray["people"]["person"] = array(); array_push($xmlArray["people"]["person"], $mary); array_push($xmlArray["people"]["person"], $bob); //instatiate a DOMIT! document require_once('xml_domit_include.php'); $xmldoc =& new DOMIT_Document(); //require DOMIT_Utilities file require_once('xml_domit_utilities.php'); //use fromArray to populate document DOMIT_Utilities::fromArray($xmldoc, $xmlArray); //echo to browser echo $xmldoc->toNormalizedString(true);
The result is:
<people> <person> <name> <first>Mary</first> <last>Zlipsakis</last> </name> <age>94</age> <location> <streetaddr>54343 Park Ave</streetaddr> <city>SmallVille</city> <state>DN</state> <country>XNE</country> </location> <icecream>vanilla</icecream> </person> <person> <name> <first>Bob</first> <last>Roberts</last> </name> <age>35</age> <email>bob@example.com</email> <location> <streetaddr>123 Skid Row</streetaddr> <city>Dark City</city> <state>DN</state> <country>XNE</country> </location> </person> </people>
The nodetools
library is a set of helper utilities for processing nodes.
The nodetools::parseattributes
method parses an attribute string into an array of key / value pairs.
For example:
//require the nodetools library require_once('xml_domit_nodetools.php'); //build a sample attribute string $myAttrString = 'x="27" y="12"'; //parse into an array $myArray = nodetools::parseattributes($myAttrString); //echo to browser echo "<pre>"; print_r($myArray); echo "</pre>";
The result is:
Array ( [x] => 27 [y] => 12 )
The nodetools::moveUp
method moves a node to the previous index in the childNodes
array.
It takes a single argument -- a reference to the node to be moved.
The following example moves the last <cd>
element to the second last position:
//require the nodetools library require_once('xml_domit_nodetools.php'); //move the node up nodetools::moveUp($cdCollection->documentElement->lastChild); //echo to browser $cdCollection->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> </cdlibrary>
The nodetools::moveDown
method moves a node to the next index in the childNodes
array.
It takes a single argument -- a reference to the node to be moved.
The following example moves the first <cd>
element to the second position:
//require the nodetools library require_once('xml_domit_nodetools.php'); //move the node up nodetools::moveDown($cdCollection->documentElement->firstChild); //echo to browser $cdCollection->toNormalizedString(true);
The result is:
<?xml version="1.0"?> <cdlibrary> <cd discid="9b0ce70c"> <name>Richard Thompson</name> <title>Mock Tudor</title> </cd> <cd discid="bb0c3c0c"> <name>Robbie Fulks</name> <title>Couples in Trouble</title> </cd> <cd discid="cf11720f"> <name>Keller Williams</name> <title>Laugh</title> </cd> </cdlibrary>
The nodetools::nodeExists
method checks whether a node exists.on a given path. The path expression must conforming to the getElementsByPath
syntax.
The method takes two parameters -- a reference to the calling node (the node at which the search begins) and the path expression.
To check if the first child <cd>
element of the <cdlibrary>
element contains a <title>
element, you can do this:
//require the nodetools library require_once('xml_domit_nodetools.php'); //check if node exists if (nodetools::nodeExists($cdCollection, '/cdlibrary/cd/title') { echo "Node exists!"; } else { echo "Node does NOT exist"; }
The result is:
Node exists!
The nodetools::fromPath
method generates a heirarchy of elements based on a path expression.
It takes three parameters:
a reference to the DOMIT_Document that will create the elements
the path expression
the node value of a text node to be appended to the last element (if required)
For example:
//require the nodetools library require_once('xml_domit_nodetools.php'); //build node tree $myNodes =& nodetools::fromPath($xmldoc, '/someElement/childElement', "Sample text"); //echo to browser echo $myNodes->toNormalizedString(true);
The result is:
<someElement> <childElement>Sample text</childElement> </someElement>
The following chapter deals with XML namespaces. The XML Namespaces specification defines a simple method for distinguishing XML and element and attribute names, by associating them with URI references (namespaces).
With the widespread adoption of XML, we have increasingly seen the coexistence and integration of different XML standards.
So what happens when you want to combine two XML documents, and each document contains an element named <title>
, but the <title>
element has a different meaning in each document? For example:
XML DOCUMENT #1: <?xml version="1.0"?> <individual gender="m"> <name>George Henry III</name> <title>Duke of Fredericton</title> <books></books> </individual> XML DOCUMENT #2: <?xml version="1.0"?> <book> <title>Transcendence Through XML</title> </book> COMBINED DOCUMENT: <?xml version="1.0"?> <individual gender="m"> <name>George Henry III</name> <title>Duke of Fredericton</title> <books> <book> <title>Transcendence Through XML</title> </book> </books> </individual>
The potential problems here should be obvious. If, for instance, the getElementsByTagName
method was used to obtain a node list of elements with the a node name of "title", how would one differentiate between a person's title, and the title of a book?
Such naming collisions can cause confusion and error, and it is essential to have some means of differentiating between identically named, but contextually different, nodes.
The XML Namespace specification proposes a mechanism whereby elements and attributes can be assigned namespaces -- or, unique identifiers that allow one to differentiate between similar tag names.
The unique identifier comes in the form of a URI (Uniform Resource Identifier), which is a convention for identifying resources on the web (an URL, or universal resource locator, is a type of URI).
For example, the namespace URI for the Dublin Core -- an XML specification for interoperable online metadata standards -- is:
http://purl.org/dc/elements/1.1/
Since the URI tends to be somewhat longish to appear frequently in your document, you can specify an abbreviation for the URI, know as the Namespace Prefix.
The namespace URI and namespace prefix are defined within your XML document, often at the level of the document element (but not necessarily so), using the keyword xmlns.
In our <individual>
example from above, we might use namespaces to do the following:
<?xml version="1.0"?> <person:individual person:gender="m" xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/"> <person:name>George Henry III</person:name> <person:title>Duke of Fredericton</person:title> <person:books> <book:book> <book:title>Transcendence Through XML</book:title> </book:book> </person:books> </person:individual>
You'll notice that in the document element node, two items have been added which appear to be attributes but which are actually Namespace Declarations. A namespace declaration:
begins with the prefix xmlns:
is followed by the namespace prefix: e.g., person
is followed by an equal sign
concludes with the URI in quotation marks: e.g. "http://www.engageinteractive.com/person/"
The namespace declaration says basically that:
There are elements and /or attributes in the following XML that will be assigned the URI http://www.engageinteractive.com/person/ and these elements and/or attributes are different from elements and/or attributes that are assigned the URI http://www.engageinteractive.com/book/
It also says that:
We will use the prefix "person" as shorthand for the URI http://www.engageinteractive.com/person/, and we will use the abbreviation "book" as shorthand for the URI http://www.engageinteractive.com/book/
It is then a simple task of placing the prefixes person: and book: before all corresponding elements and attributes. A namespace aware XML parser will be able to parse and differentiate between the elements named <person:title>
and <book:title>
If an XML document does not contain a namespace declaration, then it is assumed that all elements in the document belong to the default namespace. The default namespace is null unless defined by the user.
If you would like to specify a user-defined default namespace, omit the namespace prefix in your xmlns declaration:
xmlns="http://www.engageinteractive.com/this.is.a.default.namespace"
Note: Default namespaces do not apply to attributes.
When an XML document uses namespaces, the tag name of an element or attribute (i.e., the part following the namespace prefix) is referred to as its Local Name.
The local name of the <person:individual>
element, for instance, is "individual".
The concatenated namespace prefix and local name are referred to as the Qualified Name, or qname.
The qualified name of <person:individual>
element, for instance, is "person:individual"
DOMIT! (although not DOMIT! Lite) is compliant with the XML Namespace specification. It implements the following methods:
To enable DOMIT! to process namespace data, invoke the setNamespaceAwareness
method before populating your XML document.
$xmldoc->setNamespaceAwareness(true);
The declareNamespace
method allows you to make a namespace declaration at the level of the calling element.
You must specify two parameters:
a namespace prefix
a namespace URI
The following creates a namespace declaration with a prefix of "domit" at the document element level:
$xmldoc->documentElement->declareNamespace('domit', 'http://www.engageinteractive.com/domit/');
The resulting namespace declaration would look like this:
xmlns:domit="http://www.engageinteractive.com/domit/"
The declareDefaultNamespace
method allows you to make a default namespace declaration at the level of the calling element.
$xmldoc->documentElement->declareDefaultNamespace('http://www.foo.com/a.default.namespace');
The resulting ndefault amespace declaration would look like this:
xmlns="http://www.foo.com/a.default.namespace"
To reset the default namespace back to its original null value, pass in an empty string to the declareDefaultNamespace
method:
$xmldoc->documentElement->declareDefaultNamespace("");
The getNamespaceDeclarationsInScope
method returns an associative array of all namespace declarations that are in scope for the calling element.
//acquire array of namespace declarations in scope $nsMap = $xmldoc->documentElement->firstChild->getNamespaceDeclarationsInScope(); //echo to browser print "<pre>"; print_r($nsMap); print "</pre>";
The result is:
Array ( [http://www.engageinteractive.com/person/] => person [http://www.engageinteractive.com/book/] => book )
The getDefaultNamespaceDeclaration
method returns a string containing the default namespace declaration in scope for for the calling element.
echo $xmldoc->documentElement->childNodes[2]->firstChild->getDefaultNamespaceDeclaration();
The result is an empty string:
A common problem with namespaces occurs when an element is moved to another location in a document, or copied to another DOM document.
If the node being copied is not the document element, and the namespace declarations in scope for that element are declared higher up in the DOM tree (for example, in the document element), then the namespace declarations can be lost.
In the following XML, for instance, if the <book:book>
element were to be copied to another DOM document, the namespace declarations in the document element might not accompany the element:
<?xml version="1.0"?> <person:individual person:gender="m" xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/"> <person:name>George Henry III</person:name> <person:title>Duke of Fredericton</person:title> <person:books> <book:book> <book:title>Transcendence Through XML</book:title> </book:book> </person:books> </person:individual>
The copyNamespaceDeclarationsLocally
method addresses this problem, by forcing all namespace delarations that are in scope for the element to be explicitly duplicated on the element itself.
//get reference to book:book node $bookNode =& $xmldoc->documentElement->childNodes[2]->firstChild; //copy namespace declarations $bookNode->copyNamespaceDeclarationsLocally(); //echo to browser echo $bookNode->toNormalizedString(true);
The result is:
<book:book xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/"> <book:title>Transcendence Through XML</book:title> </book:book>
The createElementNS
method is used to create a namespace compliant element.
createElementNS
takes two parameters:
the namespace URI of the element
its qualified name
The following example will create the <book:title>
element:
//create namespace compliant element $myElement =& $xmldoc->createElementNS('http://www.engageinteractive.com/book/', 'book:title'); //echo to browser echo $myElement->toNormalizedString(true);
The result is:
<book:title />
Note that using the createElement
method will not create an element properly when namespace awareness is enabled.
The getElementsByTagNameNS
method is a namespace compliant version of getElementsByTagName
. It allows you to search for elements in an XML document by specifying:
the namespace URI of the element
the local name
The following example matches the <book:title>
element:
//find book:title element $myNodeList =& $xmldoc->getElementsByTagNameNS('http://www.engageinteractive.com/book/', 'title'); //echo to browser echo $myNodeList->toNormalizedString(true);
The result is:
<book:title>Transcendence Through XML</book:title>
The createAttributeNS
method is the namespace equivalent of createAttribute
. It enables you to create a new, namespace compliant, attribute node.
createAttributeNS
takes two parameters:
the namespace URI of the attribute
the local name
The following example creates a new attribute named "book:language", with a value of "en"
//create namespace compliant attribute $myAttr =& $xmldoc->createAttributeNS('http://www.engageinteractive.com/book/', 'language', 'en'); //echo to browser echo $myAttr->toNormalizedString(true);
The result is:
book:language='en'
The hasAttributeNS
and getAttributeNS
methods are namespace compliant versions of hasAttribute
and getAttribute
. Both methods take as parameters:
the namespace URI of the attribute
the local name
The following example checks if an attribute named 'gender' with a namespace URI of 'http://www.engageinteractive.com/person/' exists in the document element, and echoes the value to the browser:
//set variables for namespace URI and local name $URI = 'http://www.engageinteractive.com/person/'; $localName = 'gender'; //determine if atrribute exists if ($xmldoc->documentElement->hasAttributeNS($URI, $localName)) { //echo to browser echo $xmldoc->documentElement->getAttributeNS($URI, $localName); }
The result is:
m
The setAttributeNS
method is a namespace compliant version of setAttribute
. It creates a new namespace compliant attribute for the calling element, or overwrites the value of the attibute if one already exists.
setAttributeNS
takes three parameters:
the namespace URI of the attribute
the qualified name
the value of the attribute
The following example sets a new attribute on the <book:title> element.
//find book:title element $myNodeList =& $xmldoc->getElementsByTagNameNS('http://www.engageinteractive.com/book/', 'title'); //get first match $myElement =& $myNodeList->item(0); //add attribute named "book:language" to the element $myElement->setAttributeNS('http://www.engageinteractive.com/book/', 'book:language', 'en'); //echo to browser echo $myElement->toNormalizedString(true);
The result is:
<book:title book:language="en">Transcendence Through XML</book:title>
The getAttributeNodeNS
and setAttributeNodeNS
methods are namespace compliant versions of getAttributeNode
and setAttributeNode
.
getAttributeNodeNS
takes two parameters:
the namespace URI of the attribute
the local name
setAttributeNodeNS
takes a single parameter -- a reference to the node to be added/set.
The following example echoes the value of the "gender" attribute and then changes it to "f":
//get the attribute node named "gender" in the document element $attrNode =& $xmldoc->documentElement->getAttributeNodeNS('http://www.engageinteractive.com/person/', 'gender'); //echo value of attr node echo "original value of gender node is: " . $attrNode->getValue(); //create a new attribute node $myAttr =& $xmldoc->createAttributeNS('http://www.engageinteractive.com/person/', 'gender', 'f'); //overwrite existing attr with new one $xmldoc->documentElement->setAttributeNodeNS(); //echo to browser echo $xmldoc->documentElement->toNormalizedString(true);
The result is:
<person:individual person:gender="f" xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/"> <person:name>George Henry III</person:name> <person:title>Duke of Fredericton</person:title> <person:books> <book:book> <book:title>Transcendence Through XML</book:title> </book:book> </person:books> </person:individual>
The removeAttributeNS
method is the namespace counterpart to removeAttribute
. It enabled you to remove an attribute from an element.
removeAttributeNS
takes two parameters:
the namespace URI of the attribute
the local name
The following example removes the "gender" attribute from the document element:
//remove "gender" attribute from document element $xmldoc->documentElement->removeAttributeNS('http://www.engageinteractive.com/person/', 'gender'); //echo to browser echo $xmldoc->documentElement->toNormalizedString(true);
The result is:
<person:individual xmlns:person="http://www.engageinteractive.com/person/" xmlns:book="http://www.engageinteractive.com/book/"> <person:name>George Henry III</person:name> <person:title>Duke of Fredericton</person:title> <person:books> <book:book> <book:title>Transcendence Through XML</book:title> </book:book> </person:books> </person:individual>
DOMIT! now has experimental XPath support.
XPath is a syntax for locating nodes in an XML tree using "path"-like expressions. A good introductory tutorial on XPath can be found at: http://www.w3schools.com/xpath/
DOMIT! implements XPath calls through the selectNodes
method. Not all of the specification is supported currently.
selectNodes
can be called from any XML document or element node. It converts an XPath expression into a node list or single node that matches the specified pattern. For example:
$nodeList =& $xmldoc->selectNodes("/book/chapter[@id='1234']");
The above example will return a node list containing all nodes in the XML document:
whose document element is named 'book', which has
a child node named 'chapter', which has
an 'id' attribute with a value of '1234'
If you would like a single node to be returned by selectNodes
, rather than the entire node list of matching elements, you can specify the index of the requested node by passing an integer as the second parameter of selectNodes
.
The index is 1-based. The following example will return the first node matching the XPath expression:
$nodeList =& $xmldoc->selectNodes("/book/chapter[@id='1234']", 1);
Add content here!
Some of the plans for DOMIT include:
UTF-8 support
fuller XPath support
OneDOM: a generic wrapper for DOMIT! and the PHP DOM_XML library
DOMIT! has only been made possible through the suggestions, bug reports, and code submissions of others.
If you would like to contribute to DOMIT! or join the DOMIT! team, please email <johnkarl@nbnet.nb.ca>