XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (788 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

9.38Mb size Format: txt, pdf, ePub

1.
Use the value of the system property
javax.xml.parsers.DocumentBuilderFactory
if it is available. You can typically set system properties using the --
D
option on the Java command line, or by calling
System.setProperty()
from your application.

2.
Look for a properties file
$JAVA_HOME/lib/jaxp.properties
, and within this file, for the property named:

javax.xml.parsers.DocumentBuilderFactory

3.
Use the services API, which is part of the JAR specification. In practice, this means that the DOM implementation used will be the first one found on the classpath.

It is likely that when you install a particular DOM implementation, it will contain a file in its
.jar
archive that makes that particular implementation the default, so if you don't do anything to select a specific implementation, the one chosen will depend on the order of files and directories on your class path.

As with SAX, the default parser changed from Crimson to Xerces 2 with the introduction of Java 5.

Once you have a
DocumentBuilderFactory
, you can use a number of methods to configure it. Finally, you can call the
newDocumentBuilder()
method to return a
DocumentBuilder
. The methods available are:

Method	Description
Object getAttribute(String)	Gets information about the properties of the underlying implementation
boolean isCoalescing()	Determines whether the resulting DocumentBuilder will merge CDATA nodes into their adjacent text nodes
boolean isExpandEntityReferences()	Determines whether the resulting DocumentBuilder will expand entity references and merge their content into the adjacent text nodes
boolean isIgnoringComments()	Determines whether the resulting DocumentBuilder will ignore comments in the source XML
boolean isIgnoringElement ContentWhitespace()	Determines whether the resulting DocumentBuilder will ignore whitespace in element content
boolean isNamespaceAware()	Determines whether the resulting DocumentBuilder is namespace aware
boolean isValidating()	Determines whether the resulting DocumentBuilder will validate the XML source
DocumentBuilder newDocumentBuilder()	Returns a new DocumentBuilder configured as specified by previous calls
static DocumentBuilderFactory newInstance()	Returns a vendor-specific DocumentBuilderFactory selected according to the rules given above
setAttribute(String, Object)	Sets vendor-specific properties on the underlying implementation
void setCoalescing(boolean)	Determines whether the resulting DocumentBuilder will merge CDATA nodes into their adjacent text nodes
void setExpandEntityReferences (boolean)	Determines whether the resulting DocumentBuilder will expand entity references and merge their content into the adjacent text nodes
void setIgnoringComments (boolean)	Determines whether the resulting DocumentBuilder will ignore comments in the source XML
void setIgnoringElementContent Whitespace(boolean)	Determines whether the resulting DocumentBuilder will ignore whitespace in element content
void setNamespaceAware(boolean)	Determines whether the resulting DocumentBuilder is namespace aware
void setValidating(boolean)	Determines whether the resulting DocumentBuilder will validate the XML source

javax.xml.parsers.DocumentBuilder

A
DocumentBuilder
is always obtained by calling the
newDocumentBuilder()
method of a
DocumentBuilderFactory
.

A
DocumentBuilder
performs the task of parsing a source XML document and returning the resulting instance of
org.w3.dom.Document
, containing the root of a tree representation of the document in memory.

The source document is specified in similar ways to the input for a SAX parser. This doesn't mean that a
DocumentBuilder
has to use a SAX parser to do the actual parsing: some will work this way and others won't. It's defined this way to avoid unnecessary differences between the SAX and DOM approaches.

You might be aware that in the Microsoft DOM implementation, the
Document
class has a method
load()
that parses a source XML file and constructs a
Document
object. This is a Microsoft extension; there is no corresponding method in the W3 C DOM definition. This
DocumentBuilder
class fills the gap.

The methods available are:

Method	Description
boolean isNamespaceAware()	Indicates whether the parser understands XML namespaces.
boolean isValidating()	Indicates whether the parser validates the XML source.
Document newDocument()	Returns a new Document object with no content. The returned Document can be populated using DOM methods such as createElement() .
Document parse(File)	Parses the XML in the supplied file, and returns the resulting Document object.
Document parse(InputSource)	Parses the XML in the supplied SAX InputSource , and returns the resulting Document object.
Document parse(InputStream)	Parses the XML in the supplied InputStream , and returns the resulting Document object. Note that the System ID of the source document will be unknown, so it will not be possible to resolve any relative URIs contained in the document.
Document parse(InputStream, String)	Parses the XML in the supplied InputStream , and returns the resulting Document object. The second argument supplies the System ID of the source document, which will be used to resolve any relative URIs contained in the document.
Document parse(String)	Parses the XML in the document identified by the supplied URI, and returns the resulting Document object.
void setEntityResolver (EntityResolver)	Supplies a SAX EntityResolver to be used during the parsing.
void setErrorHandler(Error Handler)	Supplies a SAX ErrorHandler to be used during the parsing.

The JAXP Transformation API

The previous sections provided a summary of the classes and methods defined in JAXP to control XML parsing. This section covers the classes and methods used to control XSLT transformation.

These classes are designed so they could be used with transformation mechanisms other than XSLT; for example, they could in principle be used to invoke XQuery (however, a different API called XQJ is under development for XQuery, which has more in common with JDBC). But XSLT is the primary target and is the one we will concentrate on.

There is one other kind of transformation that's worth mentioning, however, and this is an identity transformation in which the result represents a copy of the source. JAXP provides explicit support for identity transformations. These are more useful than they might appear, because JAXP defines three ways of supplying the source document (SAX, DOM, or lexical XML) and three ways of capturing the result document (SAX, DOM, or lexical XML), so an identity transformation can be used to convert any of these inputs to any of the outputs. For example, it can take SAX input and produce a lexical XML file as output, or it can take DOM input and produce a stream of SAX events as output. An implementation of JAXP can also support additional kinds of
Source
and
Result
objects if it chooses. This allows the “unofficial” document models such as JDOM, DOM4 J, and XOM to coexist within the JAXP framework.

JAXP is also designed to control a composite transformation consisting of a sequence of transformation steps, each defined by an XSLT stylesheet in its own right. To do this, it builds on the SAX2 concept of an
XMLFilter
, which takes an input document as a stream of SAX events and produces its output as another stream of SAX events. Any number of such filters can be arranged end to end in a pipeline to define a composite transformation.

As with the JAXP
SAXParser
and
DocumentBuilder
interfaces, JAXP allows the specific XSLT implementation to be selected using a
TransformerFactory
object. Typically, the XSLT vendors will each provide their own subclass of
TransformerFactory
.

For performance reasons, the API separates the process of compiling a stylesheet from the process of executing it. A stylesheet can be compiled once and executed many times against different source documents, perhaps concurrently in different threads. The compiled stylesheet, following Microsoft's MSXML nomenclature, is known as a
Templates
object. To keep simple things simple, however, there are also methods that combine the two processes of compilation and execution into a single call.

The classes defined in the
javax.xml.transform
package fall into several categories:

Category	Class or interface	Description
Principal classes	TransformerFactory	Selects and configures a vendor's implementation
Templates	Represents a compiled stylesheet in memory
Transformer	Represents a single execution of a stylesheet to transform a source document into a result
SAXTransformerFactory	Allows a transformation to be packaged as a SAX XMLFilter
Source	Represents the input to a transformation
Result	Represents the output of a transformation
Source classes	SAXSource	Transformation input in the form of a SAX event stream
DOMSource	Transformation input in the form of a DOM Document
StreamSource	Transformation input in the form of a serial XML document
Result classes	SAXResult	Transformation output in the form of a SAX event stream
DOMResult	Transformation output in the form of a DOM Document
StreamResult	Transformation output in the form of a serial XML document (or HTML, or a plain text file)
Helper classes	URIResolver	User-supplied object that takes a URI contained in the stylesheet (for example, in the document() function) and fetches the relevant document as a Source object
ErrorListener	User-supplied object that is notified of warnings and errors. The ErrorListener reports these conditions to the user and decides whether to continue processing.
SourceLocator	Used primarily to identify where in the stylesheet an error occurred.
DOMLocator	Subclass of SourceLocator , used when the source was a DOM.
OutputKeys	A collection of constants defining the names of properties for serial output files.
Error classes	Transformer ConfigurationException	Generally denotes an error in the stylesheet that is detected at compile time.
TransformerException	A failure occurring in the course of executing a transformation.
TransformerFactory ConfigurationError	A failure to configure the Transformer .

In the following sections I will describe each of these classes, in alphabetical order of the class name (ignoring the name of the package).

javax.xml.transform.dom.DOMLocator

A
DOMLocator
is used to identify the location of an error when the document is supplied in the form of a DOM. This object will normally be created by the processor when an error occurs, and can be accessed using the
getLocator()
method of the relevant
Exception
object. It specializes
SourceLocator
, providing one additional method:

Method	Description
org.w3c.dom.Node getOriginatingNode()	Returns the node at which the error or other event is located

javax.xml.transform.dom.DOMResult

Other books

The Raven (A Jane Harper Horror Novel) by Bishop, Jeremy

Yowler Foul-Up by David Lee Stone

Nebraska by Ron Hansen

For His Pleasure (Dominated By The Billionaire) by Adriana Hunter

Breakpoint by Richard A. Clarke

The Charming Quirks of Others by Alexander Mccall Smith

A Necessary End by Holly Brown

The Pregnant Widow by Martin Amis

Grail by Elizabeth Bear

Devoured By Darkness by Alexandra Ivy