Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
System.Xml
System.Xml
is the XML infrastructure within Microsoft's .NET framework. It provides support for XML 1.0, XML Namespaces 1.0, XSLT 1.0, XPath 1.0, DOM level 1 and level 2, and XML Schema. The API is completely different from the older MSXML products described in the earlier part of this appendix.
Within this framework, the
.NET
namespace
System.Xml.Xsl
provides the API for XSLT processing, while
System.Xml.Xpath
provides the XPath API.
XPathDocument
This class represents a document optimized for XPath and XSLT processing. An instance of this class can be created by directly loading XML from a file:
XPathDocument doc = new XPathDocument(“source.xml”);
Other constructors are available, allowing the document to be constructed from a
Stream
, a
TextReader
, or an
XmlReader
. Some of the constructors have a second parameter allowing you to specify whether whitespace text nodes should be stripped or preserved.
An
XPathDocument
implements the interface
IXPathNavigable
, described below.
XmlNode
This class represents a node in the .NET implementation of the DOM. Because it supports the DOM data model as well as the XPath data model, it is likely to be less efficient for XPath and XSLT processing than the
XPathDocument
object.
There are subclasses of
XmlNode
for the different node kinds; for example,
XmlElement
,
XmlAttribute
, and so on.
This class, like
XPathDocument
, implements the
IXPathNavigable
interface. This means that any software written to use the
IXPathNavigable
interface can use either an
XPathDocument
or an
XmlNode
document as the data source.
One of the subclasses of
XmlNode
is
System.Xml.XmlDataDocument
, which supports an XML view of data in a relational database. This allows the XSLT processor to run directly against relational data.
IXPathNavigable
This is a very small but very significant interface, with a single method,
CreateNavigator
. This method returns an
XPathNavigator
object that can be used to process the underlying data, treating it as an implementation of the XPath data model.
For example:
XPathDocument doc = new XPathDocument(“source.xml”);
XPathNavigator nav = doc.CreateNavigator();
XPathNavigator
The
XPathNavigator
object holds a current position within a tree, and provides methods to extract properties of the node at that position and to move the current position to another related node. If the data source is implemented as an
XPathDocument
, there is actually no object representing the node itself, which makes the model very efficient because small objects have a high overhead.
Because an
XPathNavigator
is capable of returning all the information in the XPath data model, it acts as an abstraction of a source document, and any object that implements the
XPathNavigator
interface can be used as a source document for an XSLT transformation.
XSLTransform
The class
System.Xml.Xsl.XslTransform
is used to perform an XSLT transformation. The basic sequence of operations is:
There are different variants of the
Load
method that allow the stylesheet to be loaded by supplying a URL, by nominating an
XmlReader
to read the stylesheet and perform XML parsing, or by supplying an
XPathNavigator
or
IXPathNavigable
that locates the stylesheet within an existing document in memory.
If you want to go through all the stages of loading a stylesheet, you can write:
XPathDocument ss = new XPathDocument(“stylesheet.xsl”);
XPathNavigator nav = doc.createNavigator();
XslTransform trans = new XslTransform();
trans.Load(nav);
The options to supply a URL or an
XmlReader
as input to the
Load
method can be seen as shortcuts to this process.
The
Transform
method has a very large number of different overloaded variants. Essentially it takes four arguments: the source document to be transformed, the values supplied for stylesheet parameters, the destination for the result document, and an
XmlResolver
that is used to resolve URI references supplied to the
document()
function. The number of variations of this method is due to the fact that both the source and the destination can be supplied in a number of different ways, and these are supported in all combinations.
For example:
// Load a stylesheet
XslTransform trans = new XslTransform();
trans.Load(“stylesheet.xsl”);
// Load a source document
XPathDocument source = new XPathDocument(“source.xml”);
// Set the current date and time as a stylesheet parameter
XsltArgumentList args = new XsltArgumentList();
DateTime now = DateTime.Now;
args.AddParam(“date”, “”, now.ToString());
// Create an XmlTextWriter for the output
XmlTextWriter writer = new XmlTextWriter(Console.Out);
// Perform the transformation (no XmlResolver is supplied)
xslt.Transform(source, args, writer, null);
writer.Close();
Summary
This appendix summarized the application programming interfaces available for using Microsoft's two XSLT product families: the MSXML product line, and the
System.Xml
framework classes for .NET. For full information about these APIs, you will need to go to Microsoft's documentation, but the summary given here has hopefully given you a good introduction.
As you've seen, Microsoft has promised an XSLT 2.0 processor but it's unlikely to be completed until 2009 at the earliest.
Appendix E
JAXP: The Java API for Transformation
JAXP is a Java API for controlling various aspects of XML processing, including parsing, validation, and XSLT transformation. This appendix concentrates on the transformation API. During its development this was known as TrAX (Transformation API for XML)—you will still see this term used occasionally.
JAXP is well supported by all the Java XML processors. The benefit of JAXP is that it allows you to write Java applications that invoke XSLT transformations without committing your application to a particular XSLT processor. At one time, there were five different Java processors you could choose from (xt, Saxon, Xalan, Oracle, or jd.xslt) but for most people now the choice has whittled down to Xalan and Saxon (though both these products have a choice of processors within the same product family). JAXP works so well that I have come across users who were running Saxon when they thought they were using Xalan, or vice versa. It's a good idea to include the following instruction in your initial template so that you avoid this mistake:
Created using
The version numbers for JAXP are irritatingly out of sync with those of the JDK. The following table shows the correspondence:
JAXP version | JDK version | New Functionality |
1.2 | JDK 1.4 | XML Parsing (SAX and DOM) and Transformation |
1.3 | JDK 1.5 (Java 5) | Schema processing; XPath processing; DOM level 3 |
1.4 | JDK 1.6 (Java 6) | Pull parsing (StAX) |
Generally, a new version of JAXP has also been made available as a freestanding component for use with the earlier version of the JDK, so for example you can install JAXP 1.3 on a JDK 1.4 system. But the installation process tends to be messy and error-prone, so it's best avoided unless you have no choice.
JAXP as yet does not explicitly support XSLT 2.0 or XPath 2.0 processing; you have to make do with APIs that were designed for version 1.0 of these specifications. This means there are many aspects of XSLT processing that you cannot control directly; for example, you cannot run a transformation that starts at a named template and uses no source document, and there is no standard way to specify the new serialization options that are defined in XSLT 2.0. As Appendix F explains, Saxon has defined its own API extensions to get around these limitations.
The JAXP Parser API
JAXP 1.2 defined two sets of interfaces: interfaces for XML parsing, in package
javax.xml.parsers
, and interfaces for XML transformation (that is, TrAX) in package
javax.xml.transform
and its subsidiary packages. (JAXP 1.3 adds interfaces for schema validation, for XPath processing, and more.) Although the parser APIs could be regarded as being out of scope for this book, applications will often use both together, so I shall start by quickly reviewing the two parser APIs, covering SAX parsing and DOM parsing. The StAX specification for pull parsing included in JAXP 1.4 provides a third option, but this is currently of rather specialized interest, so I will say no more about it. You can find an overview at
http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP2.html
.
The JAXP interfaces do not supersede the SAX and DOM interfaces, which are described in many XML reference books. Rather, they supplement them with facilities that are (or were at one time) lacking in both SAX and DOM, namely the ability to select a SAX or DOM parser to do the processing, and to set options such as specifying whether you want a validating or non-validating parser, and whether you want namespace processing to be performed.
I'll look at the two parts of the interface, SAX and DOM, separately.
JAXP Support for SAX
JAXP 1.2 supports SAX2, but remains backward compatible with earlier versions that supported SAX1. In the interests of brevity, I will leave out the features that are relevant only to SAX1.
javax.xml.parsers.SAXParserFactory
The first thing an application must do is to obtain a
SAXParserFactory
, which it can do by calling the static method
SAXParserFactory.newInstance()
. Different vendors of SAX parsers will each implement their own subclass of
SAXParserFactory
, and this call determines which vendor's parser your application will end up using. If there are several available, the one that is used is based on the following decision process:
1.
Use the value of the system property
javax.xml.parsers.SAXParserFactory
if it is available. You can typically set system properties using the
-D
option on the Java command line, or by calling
System.setProperty()
from your application.
2.
Look for a properties file
$JAVA_HOME/lib/jaxp.properties
, and within this file, for the property named
javax.xml.parsers.SAXParserFactory
.
3.
Use the services API, which is part of the JAR specification. This effectively means that the parser that is used will be the first one to be found on your classpath.
The theory is that when you install a particular SAX parser, it will contain a file within its
.jar
archive that causes that particular parser to be the default, so if you don't do anything to select a specific parser, the one chosen will depend on the order of files and directories on your class path. In practice there can be a number of complications, which we'll discuss later when talking about the analogous
TransformerFactory
interface.
The default parser in Sun's JDK 1.4 was the Crimson parser, but this changed in Java 5 to Xerces 2. There was a time when Java users split their loyalties between half a dozen decent parsers, but these days Xerces 2 has cornered the market, and there is very little reason to choose anything different.
Once you have obtained a
SAXParserFactory
, you can use a number of methods to configure it. Finally, you can call the
newSAXParser()
method to return a
SAXParser
. The methods available are as follows. I haven't listed the exceptions that are thrown: you can get these from the JavaDoc.
Method | Description |
boolean getFeature(String) | Determines whether the parser factory is configured to support a particular feature. The names of features correspond to those defined in SAX2 for the XMLReader class. |
boolean isNamespaceAware() | Determines whether parsers produced using this factory will be namespace-aware. |
boolean isValidating() | Determines whether parsers produced using this factory will perform XML validation. |
static SAXParserFactory newInstance() | Produces a SAXParserFactory for a specific vendor's parser, decided according to the rules given above. |
SAXParser newSAXParser() | Returns a SAXParser that can be used to perform parsing. This is a wrapper around the SAX2 XMLReader object. |
void setFeature(String, boolean) | Sets a particular feature on or off. The names of features correspond to those defined in SAX2 for the XMLReader class. |
void setNamespaceAware(boolean) | Indicates whether parsers produced using this factory are required to be namespace aware. |
void setValidating(boolean) | Indicates whether parsers produced using this factory are required to perform XML validation. |
The
SAXParserFactory
class was introduced in JAXP 1.0 because the SAX 1 specification itself provided no means of requesting a parser. This was fixed in SAX 2 with the introduction of the
XMLReaderFactory
class. So now you have a choice of two factories. Arguably the
XMLReaderFactory
does the job better. I wouldn't put it quite as strongly as Elliotte Rusty Harold:
SAXParserFactory
[is] a hideous, evil monstrosity of a class that should be hung, shot, beheaded, drawn and quartered, burned at the stake, buried in unconsecrated ground, dug up, cremated, and the ashes tossed in the Tiber while the complete cast of Wicked sings “Ding dong, the witch is dead.”
But one thing that
SAXParserFactory
got badly wrong is that by default, the parser that is selected is not namespace-aware. Even if your particular document is not using namespaces, you should always set this property because almost all applications written to use SAX expect to receive events in the form that a namespace-aware parser supplies them.
javax.xml.parsers.SAXParser
A
SAXParser
is obtained using the
newSAXParser()
method of a
SAXParserFactory
. A
SAXParser
is a wrapper around a SAX2
XMLReader
. You can use the
getXMLReader()
method to get the underlying
XMLReader
, but in simple cases you won't need to, since you can perform a parse and nominate a handler for all the parsing events using this class alone.
The methods relevant to SAX2 parsers are:
Method | Description |
Object getProperty(String) | Gets the named property of the underlying SAX2 XMLReader . |
XMLReader getXMLReader() | Gets the underlying SAX2 XMLReader. |
boolean isNamespaceAware() | Determines whether the underlying SAX2 XMLReader is namespace-aware. |
boolean isValidating() | Determines whether the underlying SAX2 XMLReader performs XML validation. |
void parse(File, DefaultHandler) | Parses the contents of the specified file, passing all parsing events to the specified event handler. Normally, of course, this will not actually be a SAX2 DefaultHandler , but a user-defined subclass of DefaultHandler written to process selected events. |
void parse(InputSource, DefaultHandler) | Parses the contents of the specified SAX InputSource , passing all parsing events to the specified event handler. |
void parse(InputStream, DefaultHandler) | Parses the contents of the specified InputStream , passing all parsing events to the specified event handler. The third argument contains a System ID that will be used for resolving relative URIs contained in the XML source. |
void parse(InputStream, DefaultHandler, String) | Parses the contents of the specified InputStream , passing all parsing events to the specified event handler. Note that in this case the System ID of the input is unknown, so the parser has no way of resolving relative URIs. |
void parse(String, DefaultHandler) | Parses the XML document identified by the URI in the first argument, passing all parsing events to the specified event handler. |
void setProperty(String, Object) | Sets a property of the underlying SAX2 XMLReader . |
JAXP Support for DOM
JAXP 1.2 was aligned with DOM level 2, while JAXP 1.3 introduced support for DOM level 3 (for once, the numbers make sense). The main innovation in DOM level 2 was support for XML namespaces; DOM level 3 added many non-core features designed primarily for the browser world, but the main features of interest for XSLT users were the introduction of data typing for nodes, and tests on node identity.
Unfortunately Sun made an exception to their usual backward-compatibility policies when DOM level 3 was introduced, allowing a new version of the interface that invalidated existing implementations. This can cause no end of trouble in upgrading, especially in the case of a product like Saxon that tries to implement DOM interfaces while running on different JDK versions simultaneously.
The DOM interface itself defines methods for constructing a tree programmatically, and methods for navigating around a tree, but it does not define any way of constructing a DOM tree by parsing a source XML document. JAXP is designed to plug this gap.
The architecture of the interface is very similar to the SAX case:
1.
First, call the static method
DocumentBuilderFactory.newInstance()
to get a
DocumentBuilderFactory
representing one particular vendor's DOM implementation.
2.
Then use the
newDocumentBuilder()
method on this
DocumentBuilderFactory
to obtain a
DocumentBuilder
.
3.
Finally, call one of the various
parse()
methods on the
DocumentBuilder
to obtain a DOM Document object.
javax.xml.parsers.DocumentBuilderFactory
The first thing an application must do is obtain a
DocumentBuilderFactory
, which it can do by calling the static method
DocumentBuilderFactory.newInstance()
. Different vendors of DOM implementations will each implement their own subclass of
DocumentBuilderFactory
, and this call determines which implementation your application will end up using. If there are several available, the one that is used is based on the following decision process: