Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
1.
Use the value of the system property
javax.xml.parsers.DocumentBuilderFactory
if it is available. You can typically set system properties using the --
D
option on the Java command line, or by calling
System.setProperty()
from your application.
2.
Look for a properties file
$JAVA_HOME/lib/jaxp.properties
, and within this file, for the property named:
javax.xml.parsers.DocumentBuilderFactory
3.
Use the services API, which is part of the JAR specification. In practice, this means that the DOM implementation used will be the first one found on the classpath.
It is likely that when you install a particular DOM implementation, it will contain a file in its
.jar
archive that makes that particular implementation the default, so if you don't do anything to select a specific implementation, the one chosen will depend on the order of files and directories on your class path.
As with SAX, the default parser changed from Crimson to Xerces 2 with the introduction of Java 5.
Once you have a
DocumentBuilderFactory
, you can use a number of methods to configure it. Finally, you can call the
newDocumentBuilder()
method to return a
DocumentBuilder
. The methods available are:
Method | Description |
Object getAttribute(String) | Gets information about the properties of the underlying implementation |
boolean isCoalescing() | Determines whether the resulting DocumentBuilder will merge CDATA nodes into their adjacent text nodes |
boolean isExpandEntityReferences() | Determines whether the resulting DocumentBuilder will expand entity references and merge their content into the adjacent text nodes |
boolean isIgnoringComments() | Determines whether the resulting DocumentBuilder will ignore comments in the source XML |
boolean isIgnoringElement ContentWhitespace() | Determines whether the resulting DocumentBuilder will ignore whitespace in element content |
boolean isNamespaceAware() | Determines whether the resulting DocumentBuilder is namespace aware |
boolean isValidating() | Determines whether the resulting DocumentBuilder will validate the XML source |
DocumentBuilder newDocumentBuilder() | Returns a new DocumentBuilder configured as specified by previous calls |
static DocumentBuilderFactory newInstance() | Returns a vendor-specific DocumentBuilderFactory selected according to the rules given above |
setAttribute(String, Object) | Sets vendor-specific properties on the underlying implementation |
void setCoalescing(boolean) | Determines whether the resulting DocumentBuilder will merge CDATA nodes into their adjacent text nodes |
void setExpandEntityReferences (boolean) | Determines whether the resulting DocumentBuilder will expand entity references and merge their content into the adjacent text nodes |
void setIgnoringComments (boolean) | Determines whether the resulting DocumentBuilder will ignore comments in the source XML |
void setIgnoringElementContent Whitespace(boolean) | Determines whether the resulting DocumentBuilder will ignore whitespace in element content |
void setNamespaceAware(boolean) | Determines whether the resulting DocumentBuilder is namespace aware |
void setValidating(boolean) | Determines whether the resulting DocumentBuilder will validate the XML source |
javax.xml.parsers.DocumentBuilder
A
DocumentBuilder
is always obtained by calling the
newDocumentBuilder()
method of a
DocumentBuilderFactory
.
A
DocumentBuilder
performs the task of parsing a source XML document and returning the resulting instance of
org.w3.dom.Document
, containing the root of a tree representation of the document in memory.
The source document is specified in similar ways to the input for a SAX parser. This doesn't mean that a
DocumentBuilder
has to use a SAX parser to do the actual parsing: some will work this way and others won't. It's defined this way to avoid unnecessary differences between the SAX and DOM approaches.
You might be aware that in the Microsoft DOM implementation, the
Document
class has a method
load()
that parses a source XML file and constructs a
Document
object. This is a Microsoft extension; there is no corresponding method in the W3 C DOM definition. This
DocumentBuilder
class fills the gap.
The methods available are:
Method | Description |
boolean isNamespaceAware() | Indicates whether the parser understands XML namespaces. |
boolean isValidating() | Indicates whether the parser validates the XML source. |
Document newDocument() | Returns a new Document object with no content. The returned Document can be populated using DOM methods such as createElement() . |
Document parse(File) | Parses the XML in the supplied file, and returns the resulting Document object. |
Document parse(InputSource) | Parses the XML in the supplied SAX InputSource , and returns the resulting Document object. |
Document parse(InputStream) | Parses the XML in the supplied InputStream , and returns the resulting Document object. Note that the System ID of the source document will be unknown, so it will not be possible to resolve any relative URIs contained in the document. |
Document parse(InputStream, String) | Parses the XML in the supplied InputStream , and returns the resulting Document object. The second argument supplies the System ID of the source document, which will be used to resolve any relative URIs contained in the document. |
Document parse(String) | Parses the XML in the document identified by the supplied URI, and returns the resulting Document object. |
void setEntityResolver (EntityResolver) | Supplies a SAX EntityResolver to be used during the parsing. |
void setErrorHandler(Error Handler) | Supplies a SAX ErrorHandler to be used during the parsing. |
The JAXP Transformation API
The previous sections provided a summary of the classes and methods defined in JAXP to control XML parsing. This section covers the classes and methods used to control XSLT transformation.
These classes are designed so they could be used with transformation mechanisms other than XSLT; for example, they could in principle be used to invoke XQuery (however, a different API called XQJ is under development for XQuery, which has more in common with JDBC). But XSLT is the primary target and is the one we will concentrate on.
There is one other kind of transformation that's worth mentioning, however, and this is an identity transformation in which the result represents a copy of the source. JAXP provides explicit support for identity transformations. These are more useful than they might appear, because JAXP defines three ways of supplying the source document (SAX, DOM, or lexical XML) and three ways of capturing the result document (SAX, DOM, or lexical XML), so an identity transformation can be used to convert any of these inputs to any of the outputs. For example, it can take SAX input and produce a lexical XML file as output, or it can take DOM input and produce a stream of SAX events as output. An implementation of JAXP can also support additional kinds of
Source
and
Result
objects if it chooses. This allows the “unofficial” document models such as JDOM, DOM4 J, and XOM to coexist within the JAXP framework.
JAXP is also designed to control a composite transformation consisting of a sequence of transformation steps, each defined by an XSLT stylesheet in its own right. To do this, it builds on the SAX2 concept of an
XMLFilter
, which takes an input document as a stream of SAX events and produces its output as another stream of SAX events. Any number of such filters can be arranged end to end in a pipeline to define a composite transformation.
As with the JAXP
SAXParser
and
DocumentBuilder
interfaces, JAXP allows the specific XSLT implementation to be selected using a
TransformerFactory
object. Typically, the XSLT vendors will each provide their own subclass of
TransformerFactory
.
For performance reasons, the API separates the process of compiling a stylesheet from the process of executing it. A stylesheet can be compiled once and executed many times against different source documents, perhaps concurrently in different threads. The compiled stylesheet, following Microsoft's MSXML nomenclature, is known as a
Templates
object. To keep simple things simple, however, there are also methods that combine the two processes of compilation and execution into a single call.
The classes defined in the
javax.xml.transform
package fall into several categories:
Category | Class or interface | Description |
Principal classes | TransformerFactory | Selects and configures a vendor's implementation |
Templates | Represents a compiled stylesheet in memory | |
Transformer | Represents a single execution of a stylesheet to transform a source document into a result | |
SAXTransformerFactory | Allows a transformation to be packaged as a SAX XMLFilter | |
Source | Represents the input to a transformation | |
Result | Represents the output of a transformation | |
Source classes | SAXSource | Transformation input in the form of a SAX event stream |
DOMSource | Transformation input in the form of a DOM Document | |
StreamSource | Transformation input in the form of a serial XML document | |
Result classes | SAXResult | Transformation output in the form of a SAX event stream |
DOMResult | Transformation output in the form of a DOM Document | |
StreamResult | Transformation output in the form of a serial XML document (or HTML, or a plain text file) | |
Helper classes | URIResolver | User-supplied object that takes a URI contained in the stylesheet (for example, in the document() function) and fetches the relevant document as a Source object |
ErrorListener | User-supplied object that is notified of warnings and errors. The ErrorListener reports these conditions to the user and decides whether to continue processing. | |
SourceLocator | Used primarily to identify where in the stylesheet an error occurred. | |
DOMLocator | Subclass of SourceLocator , used when the source was a DOM. | |
OutputKeys | A collection of constants defining the names of properties for serial output files. | |
Error classes | Transformer ConfigurationException | Generally denotes an error in the stylesheet that is detected at compile time. |
TransformerException | A failure occurring in the course of executing a transformation. | |
TransformerFactory ConfigurationError | A failure to configure the Transformer . |
In the following sections I will describe each of these classes, in alphabetical order of the class name (ignoring the name of the package).
javax.xml.transform.dom.DOMLocator
A
DOMLocator
is used to identify the location of an error when the document is supplied in the form of a DOM. This object will normally be created by the processor when an error occurs, and can be accessed using the
getLocator()
method of the relevant
Exception
object. It specializes
SourceLocator
, providing one additional method:
Method | Description |
org.w3c.dom.Node getOriginatingNode() | Returns the node at which the error or other event is located |
javax.xml.transform.dom.DOMResult