XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (795 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
12.34Mb size Format: txt, pdf, ePub

This application is quite unrealistic, but the same principle of keeping source documents and stylesheets in memory can often be used to achieve significant performance benefits in a servlet environment, such as the application described in Chapter 19.

Example 4: Using the Processing Instruction

The previous examples have all specified the source document and the stylesheet separately. However, as you saw in Chapter 3, it is possible for a source XML document to identify its preferred stylesheet using an

processing instruction at the start of the source XML. This example shows how to extract the relevant stylesheet using the JAXP API: specifically, the
getAssociatedStyle sheet()
method provided by the
TransformerFactory
object.

The
main()
method of this example (
Associated.java
) is:

public static void main(String[] args) throws Exception {

    StreamSource source = new StreamSource(new File(args[0]));

    TransformerFactory factory = TransformerFactory.newInstance();

    // Get the associated stylesheet for the source document

    Source style = factory.getAssociatedStylesheet(source, null, null, null);

    // Use this to do the transformation

    Transformer t = factory.newTransformer(style);

    t.transform(source, new StreamResult(System.out));

}

Specifying null values for the media, title, and charset arguments of
getAssociatedStylesheet()
selects the default stylesheet for the document. If the document has multiple

processing instructions, it is possible to use these parameters to choose more selectively.

You can run this example with the command:

java Associated source.xml

This example illustrates a difference between the Saxon and Xalan implementations of JAXP. If you run this example on a source document that has no

processing instruction, Saxon throws a
TransformerConfigurationException
, whereas Xalan returns
null
from the
getAssociatedStylesheet()
method. The JAXP specification, as often happens, is silent on the question. In production code, it would be best to check for both conditions.

Example 5: A SAX Pipeline

It can often be useful to place an XSLT transformation within a SAX pipeline. A pipeline consists of a series of stages, each of which implements the SAX2 interface
XMLFilter
. The filters are connected together so that each filter looks like a SAX2
ContentHandler
(a receiver of SAX events) to the previous stage in the pipeline, and looks like an
XMLReader
(a supplier of SAX events) to the following stage. Some of these filters might be XSLT filters, others might be written in Java or implemented using other tools.

In our example (
Pipeline.java
) we will use a pipeline that contains a source (the XML parser), three filters, and a sink (a serializer). The first filter will be a Java-written
XMLFilter
whose job is to convert all the element names in the document to upper-case, recording the original name in an attribute. The second filter is an XSLT transformation that copies some elements through unchanged and removes others, based on the value of another attribute. The final filter is another Java-written
XMLFilter
that restores the element names to their original form.

I've invented this example for the purpose of illustration, but there is some rationale behind it. In XML, upper-case and lower-case are distinct, so

and

are quite distinct element names. But the legacy of HTML means you may sometimes want to do a transformation in which

and

are handled in the same way. This isn't easy to achieve in XSLT (it's easier in XSLT 2.0, but still clumsy), so it makes sense to do the pre- and post-processing in Java.

Start with the two Java-written
XMLFilter
classes. These are written as subclasses of the SAX helper class
XMLFilterImpl
. You only need to implement the
startElement()
and
endElement()
methods; the other methods simply pass the events through unchanged.

The prefilter looks like this. It normalizes the name of the element to lower-case, and saves the supplied local name and QName as additional attributes.

private class PreFilter extends XMLFilterImpl {

    public void startElement (String uri, String localName,

                                   String qName, Attributes atts)

    throws SAXException {

        String newLocalName = localName.toLowerCase();

        String newQName = qName.toUpperCase();

        AttributesImpl newAtts =

            (atts.getLength()>0 ?

                new AttributesImpl(atts) :

                new AttributesImpl());

        newAtts.addAttribute(“”, “old-local-name”,

                        “old-local-name”, “CDATA”, localName);

        newAtts.addAttribute(“”, “old-qname”,

                        “old-qname”, “CDATA”, qName);

        super.startElement(uri, newLocalName, newQName, newAtts);

    }

    public void endElement (String uri, String localName,

                                 String qName)

    throws SAXException {

        String newLocalName = localName.toLowerCase();

        String newQName = qName.toUpperCase();

        super.endElement(uri, newLocalName, newQName);

    }

}

The postfilter is very similar; the only difference is that because the original element name is needed by the
endElement()
code as well as
startElement()
, the
startElement()
code (which gets the names from the attribute list) saves them on a stack where
endElement
can pick them up later.

private class PostFilter extends XMLFilterImpl {

    public Stack stack;

    public void startDocument() throws SAXException {

        stack = new Stack();

        super.startDocument();

    }

    public void startElement (String uri, String localName, String qName,

                              Attributes atts)

    throws SAXException {

        String originalLocalName = localName;

        String originalQName = qName;

        AttributesImpl newAtts = new AttributesImpl();

        for (int i=0; i

            String name = atts.getQName(i);

            String val = atts.getValue(i);

            if (name.equals(“old-local-name”)) {

                originalLocalName = val;

            } else if (name.equals(“old-qname”)) {

                originalQName = val;

            } else {

                newAtts.addAttribute(

                                atts.getURI(i),

                                atts.getLocalName(i),

                                name,

                                atts.getType(i),

                                val);

            }

        }

        super.startElement(uri, originalLocalName, originalQName, newAtts);

        stack.push(originalLocalName);

        stack.push(originalQName);

    }

    public void endElement (String uri, String localName, String qName)

    throws SAXException {

        String originalQName = (String)stack.pop();

        String originalLocalName = (String)stack.pop();

        super.endElement(uri, originalLocalName, originalQName);

    }

}

Now you can build the pipeline, which actually has five components:

1.
The XML
parser
itself, which you can get using the
ParserFactory
mechanism described at the start of this appendix.

2.
The prefilter.

3.
The XSLT transformation, constructed using the stylesheet held in
filter.xsl.

4.
The postfilter.

5.
The serializer. The serializer is obtained from the
TransformerFactory
and is actually a
TransformerHandler
that performs an identity transformation with a
StreamResult
as its output.

As with any SAX2 pipeline, the first stage is an
XMLReader
, the last is a
ContentHandler
, and each of the intermediate stages is an
XMLFilter
. Each stage is linked to the previous stage using
setParent()
, except that the
ContentHandler
at the end is linked in by calling
setContentHandler()
on the last
XMLFilter
. Finally, the pipeline is activated by calling the
parse()
method on the last
XMLFilter
, which in our case is the postfilter.

Here is the code that builds the pipeline and runs a supplied source file through it:

public void run(String input) throws Exception {

    StreamSource source = new StreamSource(new File(input));

    File style = new File(“filter.xsl”);

    TransformerFactory factory = TransformerFactory.newInstance();

    if (!factory.getFeature(SAXTransformerFactory.FEATURE_XMLFILTER)) {

        System.err.println(“SAX Filters are not supported”);

    } else {

        SAXParserFactory parserFactory = SAXParserFactory.newInstance();

        parserFactory.setNamespaceAware(true);

        XMLReader parser = parserFactory.newSAXParser().getXMLReader();

        SAXTransformerFactory saxFactory = (SAXTransformerFactory)factory;

        XMLFilter pre = new PreFilter();

        // substitute your chosen SAX2 parser here, or use the

        // SAXParserFactory to get one

        pre.setParent(parser);

        XMLFilter filter = saxFactory.newXMLFilter(new StreamSource(style));

        filter.setParent(pre);

        XMLFilter post = new PostFilter();

        post.setParent(filter);

        TransformerHandler serializer = saxFactory.newTransformerHandler();

        serializer.setResult(new StreamResult(System.out));

        Transformer trans = serializer.getTransformer();

        trans.setOutputProperty(OutputKeys.METHOD, “xml”);

        trans.setOutputProperty(OutputKeys.INDENT, “yes”);

        post.setContentHandler(serializer);

        post.parse(source.getSystemId());

    }

}

For the example I've given the class a trivial main program as follows:

public static void main(String[] args) throws Exception {

    new Pipeline().run(args[0]);

}

And you can execute it as:

java Pipeline mixed-up.xml

The results are sent to standard output.

Summary

In this appendix I have given an overview of the JAXP interfaces.

I started, for the sake of completeness, with a quick tour of the JAXP facilities for controlling SAX and DOM parsers, found in package
javax.xml.parsers
.

I then gave detailed descriptions of the classes and methods in the package
javax.xml.transform
and its subsidiary packages.

Finally, I showed some simple examples of JAXP in action. Although the applications chosen were very simple, they illustrate the range of possibilities for using Java and JAXP to integrate XSLT components into a powerful application architecture.

Appendix F

Saxon

Saxon is an implementation of XSLT 2.0 produced by the author of this book, Michael Kay. Saxon also includes XQuery and XML Schema processors. The product runs on two platforms, Java and .NET, and it exists in two versions: an open source product Saxon-B, which implements the basic conformance level of the XSLT specification, and a commercial product Saxon-SA, which adds schema-aware processing. All versions can be obtained by following links from
http://saxon.sf.net/
.

There is also an older version of Saxon available, version 6.5, which implements XSLT 1.0. This appendix is concerned only with the XSLT 2.0 processor.

The Java version of Saxon requires JDK 1.4 or a later Java release, and there are no other dependencies. The .NET version is produced by cross-compiling the Java code into the Intermediate Language (IL) used by the .NET platform, using the IKVMC cross-compiler produced by Jeroen Frijters (
http://www.ikvm.net
). This runs on .NET version 1.1 or 2.0.

There are three ways of running Saxon:

  • You can run it from within a product that provides a graphical user interface. Saxon doesn't come with its own GUI, but it is integrated into a number of development environments such as Stylus Studio (
    stylusstudio.com
    ) and oXygen (
    oxygenxml.com
    ), and it can be configured as an external processor for use within XML Spy (
    altova.com
    ). If you just want to experiment with Saxon, the quickest way to get started is probably to download Kernow (
    kernowforsaxon.sf.net
    ). As long as you have Java installed (J2SE 6), you don't need to install anything else—Saxon comes bundled with Kernow.
  • You can run Saxon from the command line, either on Java or .NET. This is described in the next section.
  • You can invoke Saxon from within your Java or .NET application. On Java, Saxon implements the standard JAXP interfaces described in Appendix E, though if you want to get the full benefits of the product, then you'll need to understand how Saxon extends these interfaces (JAXP in its current form was designed with XSLT 1.0 in mind). On the .NET platform, Saxon has its own API, which is outlined on page 1203.

Saxon doesn't come with an installer, so whether you use the Java or .NET product, it's essentially a question of unzipping the distributed files into a directory of your own choosing. For details, see the Getting Started documentation at
http://www.saxonica.com/documentation
.

For Java, the most likely place to slip up is in setting the classpath. This is largely because Java doesn't give you much help when you get it wrong. Either you can set the classpath using the
-cp
option on the command line when you run Saxon or you can set it in the CLASSPATH environment variable. To do this on Windows, go to Settings ⇒ Control Panel ⇒ System ⇒ Advanced ⇒ Environment Variables. If there is already an environment variable called CLASSPATH, click Edit to change its value, adding the new entries separated by semicolons from the existing entries. Otherwise, click New either under User Variables if you want to change the settings only for the current user, or under System Variables if you want to change settings for all users. Enter CLASSPATH as the name of the variable, and a list of directories and/or
.jar
files, separated by semicolons, as the value.

For the .NET product, unless you're just playing with Saxon very casually from the command line, you should install the DLL files in the General Assembly Cache. There's a batch script provided to do this; you only need to run it once. On Vista it's probably easier to do it using the .NET framework administration tool which can be reached via the Control Panel.

If you're using the schema-aware product Saxon-SA, then you'll need to obtain a license key from Saxonica. It comes with instructions for how to install it.

Using Saxon from the Command Line

If you are using Saxon on a Windows platform (and even more so if you are running on a Mac), then you may not be accustomed to using the command line to run applications. You can do this from the standard MS-DOS console that comes with Windows, but I wouldn't recommend it because it's too difficult to correct your typing mistakes and to stop output from scrolling off the screen. It is far better to install a text editor that includes a Windows-friendly command line capability. If you're familiar with Unix tools, then you may want to install Cygwin (
cygwin.com
). I quite like the console plugin for the open-source jEdit editor (from
jedit.org
), mainly because it has good Unicode support, but for general editing I usually use UltraEdit (
ultraedit.com
) which has a basic-but-adequate capability for launching command line applications.

The command line can be considered in two halves:

command options

The command part causes the Saxon XSLT processor to be invoked, and the options are then passed to Saxon to control what the XSLT transformation actually does.

For Java, the simplest form of the command is:

java -jar saxon9.jar [options]

This works provided that java is on your
PATH
, and
saxon9.jar
is in the current directory. If either of these conditions isn't true, you may need to add a full path so that the Java VM and/or Saxon can be located.

Although the
-jar
form of the command is very convenient, it is also very restricted, because it does not allow code to be loaded from anywhere other than the specified JAR file. For anything more complicated (for example, a stylesheet that uses extension functions, or one that access DOM, JDOM, or XOM files, or one that uses schema-aware processing), you will need to use a form of command that uses the classpath. You can either set the classpath within the command line itself:

java -cp c:\saxon\saxon9.jar;c:\saxon\saxon9-dom.jar net.sf.saxon.Transform [options]

or you can set the CLASSPATH environment variable and omit the -cp option:

java net.sf.saxon.Transform [options]

Here
net.sf.saxon.Transform
is the entry point that tells Java to load the Saxon XSLT transformer. If you get a message indicating that this class wasn't found, it means there is something wrong with your classpath.

For .NET the situation is somewhat simpler. Either make the directory containing the Saxon executables (including
Transform.exe
) the current directory, or add it to the system PATH variable. Then use the command:

Transform [options]

The options are the same whichever form of the command you use, and are described below. For a more detailed explanation, see the Saxon documentation. This table relates to Saxon 9.0 and later; earlier versions used a slightly different format.

Option
Description
-a
Use the

processing instruction in the source document to identify the stylesheet to be used. The stylesheet argument should then be omitted.
-c:filename
Requests use of a stylesheet that has been previously compiled using the
net.sf.saxon.Compile
command.
-cr:classname
Sets a user-defined resolver for collection URIs.
-dtd:on|off
Sets DTD validation on or off.
-expand:on|off
The value
off
suppresses expansion of attribute defaults defined in the DTD or schema.
-explain[:filename]
Requests output showing how the code has been optimized. The output is sent to the console unless a filename is supplied.
-ext:on|off
The value
off
prevents calls on extension functions.
-im:modename
Starts execution in the named mode.
-it:template
Specifies the name of the initial template. The transformation will start by evaluating this named template.
-l:on|off
Switches line numbering on (or off) for the source document. Line numbers are accessible through the extension function
saxon:line-number()
, or from a trace listener.
–m:classname
Specifies the name of a Java class used to process the output of

instructions.
–o:filename
Defines a file to contain the primary output of the transformation. Filenames for secondary output files (created using

) will be interpreted relative to the location of this primary output file.
-or:classname
Sets a user-defined resolver for secondary output files.
-outval:recover|fatal
Indicates whether validation failures found when validating the result document should be treated as fatal or recoverable.
-p:on|off
The value
on
means that query parameters such as
val=strict
are recognized in URIs passed to the
doc()
and
document()
functions.
-r:classname
Sets a JAXP
URIResolver
, used to resolve all URIs used in

,

, or in the
doc()
and
document()
functions.
-repeat:N
Runs the same transformation N times. Useful for performance measurement.
-s:filename
Specifies the principal source document.
-sa
Requests schema-aware processing.
-strip:all|none|ignorable
Requests stripping of all whitespace, or none, or “ignorable” whitespace (whitespace in elements defined by the DTD or schema to have element-only content).
-t
Displays information about the Saxon and Java versions in use, and progress messages indicating which files are being processed and how long the key stages of processing took.
–T[:classname]
Traces execution of the stylesheet. Each instruction is traced as it is executed, identifying the instruction and the current location in the source document by line number. If a classname is supplied, trace output is sent to a user-supplied
TraceListener
.
–TJ
Traces the loading of Java extension functions. This is a useful debugging aid if you are having problems in this area.
–tree:tiny|linked
Chooses the implementation of the XDM tree model.
–u
Indicates that the names of the source document and stylesheet are URLs rather than filenames.
–val:strict|lax
Requests strict or lax schema-based validation of all source documents. This option is available only with Saxon-SA.
-versmsg:on|off
Setting this to
off
suppresses the warning message that appears when running against a 1.0 stylesheet.
-warnings: silent|recover|fatal
Indicates how XSLT-defined recoverable errors are to be handled. Either recover silently; or output a warning message and continue; or treat the error as fatal.
–x:classname
Defines the XML parser to be used for the source document, and for any additional document loaded using the
document()
function.
-xi
Enable XInclude processing (if the XML parser supports it).
-xsl:filename
The stylesheet to be applied.
-xmlversion:1.0|1.1
Indicates whether the XML 1.0 or 1.1 rules for the syntax of names (etc) should be used.
–y:classname
Defines the XML parser to be used for processing stylesheet documents and schema documents.

Other books

Bodega Dreams by Ernesto B. Quinonez
The Fright of the Iguana by Johnston, Linda O.
Skin by Dale Mayer
Slay Belles by Nancy Martin
Doctor Who by Nicholas Briggs
The Harder They Fall by Ravenna Tate
Philida by André Brink
Expiación by Ian McEwan

© FullEnglishBooks 2015 - 2025    Contact for me [email protected]