Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
This application is quite unrealistic, but the same principle of keeping source documents and stylesheets in memory can often be used to achieve significant performance benefits in a servlet environment, such as the application described in Chapter 19.
Example 4: Using the Processing Instruction
The previous examples have all specified the source document and the stylesheet separately. However, as you saw in Chapter 3, it is possible for a source XML document to identify its preferred stylesheet using an
processing instruction at the start of the source XML. This example shows how to extract the relevant stylesheet using the JAXP API: specifically, the
getAssociatedStyle sheet()
method provided by the
TransformerFactory
object.
The
main()
method of this example (
Associated.java
) is:
public static void main(String[] args) throws Exception {
StreamSource source = new StreamSource(new File(args[0]));
TransformerFactory factory = TransformerFactory.newInstance();
// Get the associated stylesheet for the source document
Source style = factory.getAssociatedStylesheet(source, null, null, null);
// Use this to do the transformation
Transformer t = factory.newTransformer(style);
t.transform(source, new StreamResult(System.out));
}
Specifying null values for the media, title, and charset arguments of
getAssociatedStylesheet()
selects the default stylesheet for the document. If the document has multiple
processing instructions, it is possible to use these parameters to choose more selectively.
You can run this example with the command:
java Associated source.xml
This example illustrates a difference between the Saxon and Xalan implementations of JAXP. If you run this example on a source document that has no
processing instruction, Saxon throws a
TransformerConfigurationException
, whereas Xalan returns
null
from the
getAssociatedStylesheet()
method. The JAXP specification, as often happens, is silent on the question. In production code, it would be best to check for both conditions.
Example 5: A SAX Pipeline
It can often be useful to place an XSLT transformation within a SAX pipeline. A pipeline consists of a series of stages, each of which implements the SAX2 interface
XMLFilter
. The filters are connected together so that each filter looks like a SAX2
ContentHandler
(a receiver of SAX events) to the previous stage in the pipeline, and looks like an
XMLReader
(a supplier of SAX events) to the following stage. Some of these filters might be XSLT filters, others might be written in Java or implemented using other tools.
In our example (
Pipeline.java
) we will use a pipeline that contains a source (the XML parser), three filters, and a sink (a serializer). The first filter will be a Java-written
XMLFilter
whose job is to convert all the element names in the document to upper-case, recording the original name in an attribute. The second filter is an XSLT transformation that copies some elements through unchanged and removes others, based on the value of another attribute. The final filter is another Java-written
XMLFilter
that restores the element names to their original form.
I've invented this example for the purpose of illustration, but there is some rationale behind it. In XML, upper-case and lower-case are distinct, so
and
are quite distinct element names. But the legacy of HTML means you may sometimes want to do a transformation in which
and
are handled in the same way. This isn't easy to achieve in XSLT (it's easier in XSLT 2.0, but still clumsy), so it makes sense to do the pre- and post-processing in Java.
Start with the two Java-written
XMLFilter
classes. These are written as subclasses of the SAX helper class
XMLFilterImpl
. You only need to implement the
startElement()
and
endElement()
methods; the other methods simply pass the events through unchanged.
The prefilter looks like this. It normalizes the name of the element to lower-case, and saves the supplied local name and QName as additional attributes.
private class PreFilter extends XMLFilterImpl {
public void startElement (String uri, String localName,
String qName, Attributes atts)
throws SAXException {
String newLocalName = localName.toLowerCase();
String newQName = qName.toUpperCase();
AttributesImpl newAtts =
(atts.getLength()>0 ?
new AttributesImpl(atts) :
new AttributesImpl());
newAtts.addAttribute(“”, “old-local-name”,
“old-local-name”, “CDATA”, localName);
newAtts.addAttribute(“”, “old-qname”,
“old-qname”, “CDATA”, qName);
super.startElement(uri, newLocalName, newQName, newAtts);
}
public void endElement (String uri, String localName,
String qName)
throws SAXException {
String newLocalName = localName.toLowerCase();
String newQName = qName.toUpperCase();
super.endElement(uri, newLocalName, newQName);
}
}
The postfilter is very similar; the only difference is that because the original element name is needed by the
endElement()
code as well as
startElement()
, the
startElement()
code (which gets the names from the attribute list) saves them on a stack where
endElement
can pick them up later.
private class PostFilter extends XMLFilterImpl {
public Stack stack;
public void startDocument() throws SAXException {
stack = new Stack();
super.startDocument();
}
public void startElement (String uri, String localName, String qName,
Attributes atts)
throws SAXException {
String originalLocalName = localName;
String originalQName = qName;
AttributesImpl newAtts = new AttributesImpl();
for (int i=0; i
String name = atts.getQName(i);
String val = atts.getValue(i);
if (name.equals(“old-local-name”)) {
originalLocalName = val;
} else if (name.equals(“old-qname”)) {
originalQName = val;
} else {
newAtts.addAttribute(
atts.getURI(i),
atts.getLocalName(i),
name,
atts.getType(i),
val);
}
}
super.startElement(uri, originalLocalName, originalQName, newAtts);
stack.push(originalLocalName);
stack.push(originalQName);
}
public void endElement (String uri, String localName, String qName)
throws SAXException {
String originalQName = (String)stack.pop();
String originalLocalName = (String)stack.pop();
super.endElement(uri, originalLocalName, originalQName);
}
}
Now you can build the pipeline, which actually has five components:
1.
The XML
parser
itself, which you can get using the
ParserFactory
mechanism described at the start of this appendix.
2.
The prefilter.
3.
The XSLT transformation, constructed using the stylesheet held in
filter.xsl.
4.
The postfilter.
5.
The serializer. The serializer is obtained from the
TransformerFactory
and is actually a
TransformerHandler
that performs an identity transformation with a
StreamResult
as its output.
As with any SAX2 pipeline, the first stage is an
XMLReader
, the last is a
ContentHandler
, and each of the intermediate stages is an
XMLFilter
. Each stage is linked to the previous stage using
setParent()
, except that the
ContentHandler
at the end is linked in by calling
setContentHandler()
on the last
XMLFilter
. Finally, the pipeline is activated by calling the
parse()
method on the last
XMLFilter
, which in our case is the postfilter.
Here is the code that builds the pipeline and runs a supplied source file through it:
public void run(String input) throws Exception {
StreamSource source = new StreamSource(new File(input));
File style = new File(“filter.xsl”);
TransformerFactory factory = TransformerFactory.newInstance();
if (!factory.getFeature(SAXTransformerFactory.FEATURE_XMLFILTER)) {
System.err.println(“SAX Filters are not supported”);
} else {
SAXParserFactory parserFactory = SAXParserFactory.newInstance();
parserFactory.setNamespaceAware(true);
XMLReader parser = parserFactory.newSAXParser().getXMLReader();
SAXTransformerFactory saxFactory = (SAXTransformerFactory)factory;
XMLFilter pre = new PreFilter();
// substitute your chosen SAX2 parser here, or use the
// SAXParserFactory to get one
pre.setParent(parser);
XMLFilter filter = saxFactory.newXMLFilter(new StreamSource(style));
filter.setParent(pre);
XMLFilter post = new PostFilter();
post.setParent(filter);
TransformerHandler serializer = saxFactory.newTransformerHandler();
serializer.setResult(new StreamResult(System.out));
Transformer trans = serializer.getTransformer();
trans.setOutputProperty(OutputKeys.METHOD, “xml”);
trans.setOutputProperty(OutputKeys.INDENT, “yes”);
post.setContentHandler(serializer);
post.parse(source.getSystemId());
}
}
For the example I've given the class a trivial main program as follows:
public static void main(String[] args) throws Exception {
new Pipeline().run(args[0]);
}
And you can execute it as:
java Pipeline mixed-up.xml
The results are sent to standard output.
Summary
In this appendix I have given an overview of the JAXP interfaces.
I started, for the sake of completeness, with a quick tour of the JAXP facilities for controlling SAX and DOM parsers, found in package
javax.xml.parsers
.
I then gave detailed descriptions of the classes and methods in the package
javax.xml.transform
and its subsidiary packages.
Finally, I showed some simple examples of JAXP in action. Although the applications chosen were very simple, they illustrate the range of possibilities for using Java and JAXP to integrate XSLT components into a powerful application architecture.
Appendix F
Saxon
Saxon is an implementation of XSLT 2.0 produced by the author of this book, Michael Kay. Saxon also includes XQuery and XML Schema processors. The product runs on two platforms, Java and .NET, and it exists in two versions: an open source product Saxon-B, which implements the basic conformance level of the XSLT specification, and a commercial product Saxon-SA, which adds schema-aware processing. All versions can be obtained by following links from
http://saxon.sf.net/
.
There is also an older version of Saxon available, version 6.5, which implements XSLT 1.0. This appendix is concerned only with the XSLT 2.0 processor.
The Java version of Saxon requires JDK 1.4 or a later Java release, and there are no other dependencies. The .NET version is produced by cross-compiling the Java code into the Intermediate Language (IL) used by the .NET platform, using the IKVMC cross-compiler produced by Jeroen Frijters (
http://www.ikvm.net
). This runs on .NET version 1.1 or 2.0.
There are three ways of running Saxon:
Saxon doesn't come with an installer, so whether you use the Java or .NET product, it's essentially a question of unzipping the distributed files into a directory of your own choosing. For details, see the Getting Started documentation at
http://www.saxonica.com/documentation
.
For Java, the most likely place to slip up is in setting the classpath. This is largely because Java doesn't give you much help when you get it wrong. Either you can set the classpath using the
-cp
option on the command line when you run Saxon or you can set it in the CLASSPATH environment variable. To do this on Windows, go to Settings ⇒ Control Panel ⇒ System ⇒ Advanced ⇒ Environment Variables. If there is already an environment variable called CLASSPATH, click Edit to change its value, adding the new entries separated by semicolons from the existing entries. Otherwise, click New either under User Variables if you want to change the settings only for the current user, or under System Variables if you want to change settings for all users. Enter CLASSPATH as the name of the variable, and a list of directories and/or
.jar
files, separated by semicolons, as the value.
For the .NET product, unless you're just playing with Saxon very casually from the command line, you should install the DLL files in the General Assembly Cache. There's a batch script provided to do this; you only need to run it once. On Vista it's probably easier to do it using the .NET framework administration tool which can be reached via the Control Panel.
If you're using the schema-aware product Saxon-SA, then you'll need to obtain a license key from Saxonica. It comes with instructions for how to install it.
Using Saxon from the Command Line
If you are using Saxon on a Windows platform (and even more so if you are running on a Mac), then you may not be accustomed to using the command line to run applications. You can do this from the standard MS-DOS console that comes with Windows, but I wouldn't recommend it because it's too difficult to correct your typing mistakes and to stop output from scrolling off the screen. It is far better to install a text editor that includes a Windows-friendly command line capability. If you're familiar with Unix tools, then you may want to install Cygwin (
cygwin.com
). I quite like the console plugin for the open-source jEdit editor (from
jedit.org
), mainly because it has good Unicode support, but for general editing I usually use UltraEdit (
ultraedit.com
) which has a basic-but-adequate capability for launching command line applications.
The command line can be considered in two halves:
command options
The command part causes the Saxon XSLT processor to be invoked, and the options are then passed to Saxon to control what the XSLT transformation actually does.
For Java, the simplest form of the command is:
java -jar saxon9.jar [options]
This works provided that java is on your
PATH
, and
saxon9.jar
is in the current directory. If either of these conditions isn't true, you may need to add a full path so that the Java VM and/or Saxon can be located.
Although the
-jar
form of the command is very convenient, it is also very restricted, because it does not allow code to be loaded from anywhere other than the specified JAR file. For anything more complicated (for example, a stylesheet that uses extension functions, or one that access DOM, JDOM, or XOM files, or one that uses schema-aware processing), you will need to use a form of command that uses the classpath. You can either set the classpath within the command line itself:
java -cp c:\saxon\saxon9.jar;c:\saxon\saxon9-dom.jar net.sf.saxon.Transform [options]
or you can set the CLASSPATH environment variable and omit the -cp option:
java net.sf.saxon.Transform [options]
Here
net.sf.saxon.Transform
is the entry point that tells Java to load the Saxon XSLT transformer. If you get a message indicating that this class wasn't found, it means there is something wrong with your classpath.
For .NET the situation is somewhat simpler. Either make the directory containing the Saxon executables (including
Transform.exe
) the current directory, or add it to the system PATH variable. Then use the command:
Transform [options]
The options are the same whichever form of the command you use, and are described below. For a more detailed explanation, see the Saxon documentation. This table relates to Saxon 9.0 and later; earlier versions used a slightly different format.
Option | Description |
-a | Use the processing instruction in the source document to identify the stylesheet to be used. The stylesheet argument should then be omitted. |
-c:filename | Requests use of a stylesheet that has been previously compiled using the net.sf.saxon.Compile command. |
-cr:classname | Sets a user-defined resolver for collection URIs. |
-dtd:on|off | Sets DTD validation on or off. |
-expand:on|off | The value off suppresses expansion of attribute defaults defined in the DTD or schema. |
-explain[:filename] | Requests output showing how the code has been optimized. The output is sent to the console unless a filename is supplied. |
-ext:on|off | The value off prevents calls on extension functions. |
-im:modename | Starts execution in the named mode. |
-it:template | Specifies the name of the initial template. The transformation will start by evaluating this named template. |
-l:on|off | Switches line numbering on (or off) for the source document. Line numbers are accessible through the extension function saxon:line-number() , or from a trace listener. |
–m:classname | Specifies the name of a Java class used to process the output of |
–o:filename | Defines a file to contain the primary output of the transformation. Filenames for secondary output files (created using |
-or:classname | Sets a user-defined resolver for secondary output files. |
-outval:recover|fatal | Indicates whether validation failures found when validating the result document should be treated as fatal or recoverable. |
-p:on|off | The value on means that query parameters such as val=strict are recognized in URIs passed to the doc() and document() functions. |
-r:classname | Sets a JAXP URIResolver , used to resolve all URIs used in doc() and document() functions. |
-repeat:N | Runs the same transformation N times. Useful for performance measurement. |
-s:filename | Specifies the principal source document. |
-sa | Requests schema-aware processing. |
-strip:all|none|ignorable | Requests stripping of all whitespace, or none, or “ignorable” whitespace (whitespace in elements defined by the DTD or schema to have element-only content). |
-t | Displays information about the Saxon and Java versions in use, and progress messages indicating which files are being processed and how long the key stages of processing took. |
–T[:classname] | Traces execution of the stylesheet. Each instruction is traced as it is executed, identifying the instruction and the current location in the source document by line number. If a classname is supplied, trace output is sent to a user-supplied TraceListener . |
–TJ | Traces the loading of Java extension functions. This is a useful debugging aid if you are having problems in this area. |
–tree:tiny|linked | Chooses the implementation of the XDM tree model. |
–u | Indicates that the names of the source document and stylesheet are URLs rather than filenames. |
–val:strict|lax | Requests strict or lax schema-based validation of all source documents. This option is available only with Saxon-SA. |
-versmsg:on|off | Setting this to off suppresses the warning message that appears when running against a 1.0 stylesheet. |
-warnings: silent|recover|fatal | Indicates how XSLT-defined recoverable errors are to be handled. Either recover silently; or output a warning message and continue; or treat the error as fatal. |
–x:classname | Defines the XML parser to be used for the source document, and for any additional document loaded using the document() function. |
-xi | Enable XInclude processing (if the XML parser supports it). |
-xsl:filename | The stylesheet to be applied. |
-xmlversion:1.0|1.1 | Indicates whether the XML 1.0 or 1.1 rules for the syntax of names (etc) should be used. |
–y:classname | Defines the XML parser to be used for processing stylesheet documents and schema documents. |