XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (6 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

7.83Mb size Format: txt, pdf, ePub

Chapter 13

Description	Page
Using the current() Function This stylesheet shows the use of the current() function within an XPath predicate, effectively to do a join: Given a book, it selects other books in the same category.	736
Using the document() Function to Analyze a Stylesheet A stylesheet is an XML document, so it can be used as the input to another stylesheet. This makes it very easy to write little tools that manipulate stylesheets. This example shows such a tool, designed to report on the hierarchic structure of the modules that make up a stylesheet.	758
A Look-Up Table in the Stylesheet This example uses data in a look-up table to expand abbreviations of book categories. Two techniques are shown: In the first example the look-up table is held in the stylesheet, and in the second example it is held in a separate XML document.	761
Creating Multiple Output Files The main purpose of this example is to show how the element-available() function can be used to write stylesheets that are portable across different XSLT processors, even though they use facilities that are provided in different ways by different processors.	767
Testing for xx:node-set() Extensions This example shows how the function-available() function can be used to write stylesheets that are portable across different XSLT processors, even though they use facilities that are provided in different ways by different processors.	794
Testing the Availability of a Java Method This example shows the use of the use-when attribute to compile two different versions of a global variable declaration, depending on which version of the Java JDK is in use.	796
Using generate-id() to Create Links This stylesheet produces information about holiday resorts as an HTML document; the hyperlinks within the document are generated by calling the generate-id() function to produce unique anchors.	798
Using Keys as Cross-References This example uses two source files: The principal source document is a file containing a list of books, and the secondary one (accessed using the document() function) contains biographies of authors. The author name held in the first file acts as a cross-reference to the author's biography in the second file, rather like a join in SQL.	814
Using Keys for Grouping This example shows the use of the XSLT 1.0 Muenchian grouping method to create a list of cities, grouped by country.	817
Unicode Normalization This example shows the effect of normalizing the string gar ç on> first to Unicode composed normal form, then to decomposed normal form, and then back.	847
Processing a Comma-Separated-Values File This example is a stylesheet that uses the unparsed-text() function to read a comma-separated-values file, given the URL of the file as a stylesheet parameter. It outputs an XML representation of this file, placing the rows in a element and each value in a element.	907

Chapter 15

Description	Page
Using a character-map to Comment-out Elements This illustrates one of the ways of using character maps to achieve special effects during serialization of the result tree: the character map causes special characters in the tree to be replaced by strings representing the start and end of an XML comment.	943

Chapter 16

Description	Page
Using VBScript in an MSXML3 Stylesheet This example shows a stylesheet that uses a VBScript extension function to convert dimensions in inches to the equivalent in millimeters.	958
A Java Extension Function to Calculate a Square Root This example invokes a standard method in the Java class library to compute the square root of a number in the source document.	959
Calling External Functions within a Loop This stylesheet calls external Java methods to read a source file, one line at a time, by means of recursive calls.	965
A Function with Uncontrolled Side Effects This example shows what can go wrong with extension functions: Although it is apparently only a minor change to the previous example, it produces completely wrong results because it depends on the order in which instructions are executed.	967

Chapter 17

Description	Page
A “Fill-in-the-Blanks” Stylesheet This stylesheet illustrates the fill-in-the-blanks design pattern. It is a simplified stylesheet that consists essentially of the target HTML page, with instructions inserted at the points where variable data is to be fetched from the XML source document.	974
A Navigational Stylesheet This example shows the use of a navigational stylesheet to produce a very simple sales report.	977
A Rule-Based Stylesheet Rule-based stylesheets are often used to process narrative documents, where most of the processing consists in replacing XML tags by HTML tags. This example illustrates this by showing how a Shakespeare play can be rendered in HTML.	981
Aggregating a List of Numbers This is an example of the computational design pattern: The stylesheet uses a recursive template to process a whitespace-separated list of numbers. Recursive processing is needed to produce cumulative totals efficiently, since none of the standard aggregation functions supports this directly.	993
Using Interleaved Structures Lines of verse aren't neatly nested inside speeches, and speeches aren't nested inside lines of verse: The two structures are interleaved. The usual solution to this problem is to use the hierarchic XML tagging to represent one of the structures (say the speeches) and to use empty element tags to mark the boundaries in the other structure. This example shows a stylesheet that can convert from one of these representations to the other.	995

Chapter 18

Description	Page
Formatting the XML Specification This entire chapter is devoted to the study of a single “real-life” application, the family of stylesheets used to process the XML specification and other related W3 C documents.	1002

Chapter 19

Description	Page
Converting GEDCOM Files to XML This stylesheet converts genealogical data from one version of the GEDCOM standard to a subsequent version. The input is a non-XML format in which the hierarchic structure is indicated by level numbers. The example illustrates the use of new facilities in XSLT 2.0 for reading non-XML files, matching their contents against regular expressions, and then grouping adjacent lines to create an element hierarchy.	1059
Converting from GEDCOM 5.5 to 6.0 This stylesheet converts the raw XML produced by the previous example to the schema required by the new version of the GEDCOM standard. The solution is schema-aware, using the schema for GEDCOM 6.0 to validate the output as it is written.	1063
Displaying the Family Tree Data This stylesheet produces an XHTML representation of the information concerning one individual in a GEDCOM 6.0 data file. The stylesheet is schema-aware, using a schema both to validate the input document and to ensure that the output is valid XHTML.	1073
Publishing Static HTML This stylesheet incorporates the previous stylesheet, and uses it to create a set of linked XHTML pages displaying all the individuals in the input GEDCOM 6.0 data file, suitable for publishing on a static web site.	1086
Generating HTML Pages from a Servlet This stylesheet shows an alternative way of displaying the family tree data: This time, HTML pages are generated on demand, by a transformation invoked from a Java servlet.	1087
Generating HTML Pages Using ASP.NET This does the same job as the previous example, but using different technology. HTML pages are again generated on demand, this time by an XSLT transformation controlled by C# code within an ASP.NET page.	1092
Generating HTML in the Browser This stylesheet (which is restricted to use XSLT 1.0 facilities because of browser limitations) shows a third way of displaying the same data, this time by means of a transformation performed client side, and controlled by script on the HTML page.	1095

Chapter 20

Description	Page
Knight's Tour of the Chessboard This entire chapter is devoted to the study of a single program, a computational stylesheet that calculates a route for a knight on a chessboard, in which from any starting square, each square on the board is visited exactly once. The stylesheet demonstrates how even quite complex algorithms can be implemented in XSLT by use of recursive functions.	1099

Appendix F

Description	Page
Using saxon:evaluate() to Apply Business Rules Saxon is one of a number of products that offers an extension function to evaluate XPath expressions constructed on the fly or read from a source document. This example shows how this capability can be used to allow business rules to be defined in an XML configuration document separate from the stylesheet logic.	1212

Part I

Foundations

Chapter 1:
XSLT in Context

Chapter 2:
The XSLT Processing Model

Chapter 3:
Stylesheet Structure

Chapter 4:
Stylesheets and Schemas

Chapter 5:
Types

Chapter 1

XSLT in Context

This chapter is designed to put XSLT in context. It's about the purpose of XSLT and the task it was designed to perform. It's about what kind of language it is, how it came to be that way, and how it has changed in version 2.0; and it's about how XSLT fits in with all the other technologies that you are likely to use in a typical Web-based application (including, of course, XPath, which forms a vital component of XSLT). I won't be saying very much in this chapter about what an XSLT stylesheet actually looks like or how it works: that will come later, in Chapters 2 and 3.

The chapter starts by describing the task that XSLT is designed to perform—
transformation
—and why there is the need to transform XML documents. I'll then present a trivial example of a transformation in order to explain what this means in practice.

Next, I discuss the relationship of XSLT to other standards in the growing XML family, to put its function into context and explain how it complements the other standards.

I'll describe what kind of language XSLT is, and delve a little into the history of how it came to be like that. If you're impatient you may want to skip the history and get on with using the language, but sooner or later you will ask “why on earth did they design it like that?” and at that stage I hope you will go back and read about the process by which XSLT came into being.

What Is XSLT?

XSLT (Extensible Stylesheet Language: Transformations) is a language that, according to the very first sentence in the specification (found at
http://www.w3.org/TR/xslt20/
), is primarily designed for transforming one XML document into another. However, XSLT is also capable of transforming XML to HTML and many other text-based formats, so a more general definition might be as follows:

XSLT is a language for transforming the structure and content of an XML document.

Why should you want to do that? In order to answer this question properly, we first need to remind ourselves why XML has proved such a success and generated so much excitement.

XML is a simple, standard way to interchange structured textual data between computer programs. Part of its success comes because it is also readable and writable by humans, using nothing more complicated than a text editor, but this doesn't alter the fact that it is primarily intended for communication between software systems. As such, XML satisfies two compelling requirements:

Separating data from presentation:
the need to separate information (such as a weather forecast) from details of the way it is to be presented on a particular device. The early motivation for this arose from the need to deliver information not only to the traditional PC-based Web browser (which itself comes in many flavors) but also to TV sets and handheld devices, not to mention the continuing need to produce print-on-paper. Today, for many information providers an even more important driver is the opportunity to syndicate content to other organizations that can republish it with their own look-and-feel.
Transmitting data between applications:
the need to transmit information (such as orders and invoices) from one organization to another without investing in one-off software integration projects. As electronic commerce gathers pace, the amount of data exchanged between enterprises increases daily, and this need becomes ever more urgent.

Of course, these two ways of using XML are not mutually exclusive. An invoice can be presented on the screen as well as being input to a financial application package, and weather forecasts can be summarized, indexed, and aggregated by the recipient instead of being displayed directly. Another of the key benefits of XML is that it unifies the worlds of documents and data, providing a single way of representing structure regardless of whether the information is intended for human or machine consumption. The main point is that, whether the XML data is ultimately used by people or by a software application, it will very rarely be used directly in the form it arrives: it first has to be transformed into something else.

In order to communicate with a human reader, this something else might be a document that can be displayed or printed: for example, an HTML file, a PDF file, or even audible sound. Converting XML to HTML for display is probably still the most common application of XSLT, and it is the one I will use in most of the examples in this book. Once you have the data in HTML format, it can be displayed on any browser.

In order to transfer data between different applications, we need to be able to transform information from the data model used by one application to the model used by another. To load the data into an application, the required format might be a comma-separated-values file, a SQL script, an HTTP message, or a sequence of calls on a particular programming interface. Alternatively, it might be another XML file using a different vocabulary from the original. As XML-based electronic commerce becomes widespread, the role of XSLT in data conversion between applications also becomes ever more important. Just because everyone is using XML does not mean the need for data conversion will disappear.

There will always be multiple standards in use. As I write there is a fierce debate between the protagonists of two different XML representations of office documents: the ODF specification from the Open Office community, and the OOXML specification from Microsoft and its friends. However this gets resolved, the prospects of a single XML format for all word processor documents are remote, so there will always be a need to transform between multiple formats.

Even within the domain of a single standard, there is a need to extract information from one kind of document and insert it into another. For example, a PC manufacturer who devises a solution to a customer problem will need to extract data from the problem reports and insert it into the documents issued to field engineers so that they can recognize and fix the problem when other customers hit it. The field engineers, of course, are probably working for a different company, not for the original manufacturer. So, linking up enterprises to do e-commerce will increasingly become a case of defining how to extract and combine data from one set of XML documents to generate another set of XML documents, and XSLT is the ideal tool for the job.

During the course of this chapter, we will come back to specific examples of when XSLT should be used to transform XML. For now, I just wanted to establish a feel for the importance and usefulness of transforming XML. If you are already using XSLT, of course, this may be stale news. So let's take a look now at what XSLT version 2.0 brings to the party.

Why Version 2.0?

XSLT 1.0 came out in November 1999 and was highly successful. It was therefore almost inevitable that work would start on a version 2.0. As we will see later, the process of creating version 2.0 was far from smooth and took rather longer than some people hoped. However, XSLT 2.0 was finally published as a W3C Recommendation (that is, a final specification) in January 2007, and user reaction has been very favorable.

It's tempting to look at version 2.0 and see it as a collection of features bolted on to the language, patches to make up for the weaknesses of version 1.0. As with a new release of any other language or software package, most users will find some features here that they have been crying out for, and other additions that appear surplus to requirements.

But I think there is more to version 2.0 than just a bag of goodies; there are some underlying themes that have guided the design and the selection of features. I can identify four main themes:

Integration across the XML standards family:
W3C working groups do not work in isolation from each other; they spend a lot of time trying to ensure that their efforts are coordinated. A great deal of what is in XSLT 2.0 is influenced by a wider agenda of doing what is right for the whole raft of XML standards, not just for XSLT considered in isolation.
Extending the scope of applicability:
XSLT 1.0 was pretty good at rendering XML documents for display as HTML on screen, and for converting them to XSL Formatting Objects for print publishing. But there are many other transformation tasks for which it proved less suitable. Compared with report writers (even those from the 1980s, let alone modern data visualization tools) its data handling capabilities were very weak. The language was quite good at doing conversions of XML documents if the original markup was well designed, but much weaker at recognizing patterns in the text or markup that represent hidden structure. An important aim of XSLT 2.0 was to increase the range of applications that you can tackle using XSLT.
More robust software engineering:
XSLT was always designed to be used both client-side and server-side, but in many ways XSLT 1.0 optimized the language for use in the browser. However, people write large applications in XSLT, containing 100K or more lines of code, and this needs a more rigorous and robust approach to things such as error handling and type checking.
Tactical usability improvements:
Here we
are
into the realm of added goodies. The aim here is to achieve productivity benefits, making it easier to do things that are difficult or error-prone in version 1.0. These are probably the features that existing users will immediately recognize as the most beneficial, but in the long term the other themes probably have more strategic significance for the future of the language.

Before we discuss XSLT in more detail and have a first look at how it works, let's study a scenario that clearly demonstrates the variety of formats to which we can transform XML, using XSLT.

A Scenario: Transforming Music

As an indication of how far XML has now penetrated, Robin Cover's index of XML-based application standards at
http://xml.coverpages.org/xmlApplications.html
today runs to 594 entries. (The last one is entitled
Mind Reading Markup Language
, but as far as I can tell, all the other entries are serious.)

I'll follow just one of these 594 links,
XML and Music
, which takes us to
http://xml.coverpages.org/xmlMusic.html
. On this page we find a list of no less than 18 standards, proposals, or initiatives that use XML for marking up music.

This diversity is clearly unnecessary, and many of these initiatives are already dead or dying. Even the names of the standards are chaotic: there is a Music Markup Language, a MusicML, a MusicXML, and a MusiXML, all quite unrelated. There are at least three really serious contenders: the Music Encoding Initiative (MEI), the Standard Music Description Language (SMDL), and MusicXML. The MEI derives its inspiration from the Text Encoding Initiative, and has a particular focus on the needs of music scholars (for example, the ability to capture features found in different manuscripts of the same score), while SMDL is related to the HyTime hypermedia standards and takes into account requirements such as the need to synchronize music with video or with a lighting script (it has not been widely implemented, but it has its enthusiasts). MusicXML, by contrast, is primarily focused on the needs of composers and publishers of sheet music.

Given the variety of requirements, it's unlikely that the number of standards in use will reduce any further. The different notations were invented with different purposes in mind: a markup language used by a publisher for printing sheet music has different requirements from the one designed to let you listen to the music from a browser.

In the first edition of this book, back in 2001, I introduced the idea of using XSLT to transform music as a theoretical possibility, something to make my readers think about the range of possibilities open for the language. By the time I published the second edition, PhD students were showing that it could actually be done. Today, MusicXML is a standard part of over 80 software applications including industry leaders such as Sibelius and Finale, and XSLT is routinely used to manipulate the output. The MEI website publishes XSLT stylesheets for converting between MEI and MusicXML in either direction.

As it happens, MusicXML itself provides two ways of representing a score. In one the top-level subdivision in the XML hierarchy is by instrumental part or voice; in the other the top-level structure is the timeline of the music. XSLT stylesheets are provided to convert between the two formats.

Figure 1-1
shows some of the possibilities. You could use XSLT to:

Convert music from one representation to another, for example from MEI to SMDL.
Convert music from any of these representations into visual music notation, by generating the XML-based vector graphics format SVG.
Play the music on a synthesizer, by generating a MIDI (Musical Instrument Digital Interface) file.
Perform a musical transformation, such as transposing the music into a different key or extracting parts for different instruments or voices.
Extract the lyrics, into HTML or into a text-only XML document.
Capture music from non-XML formats and translate it to XML (XSLT 2.0 is especially useful here).

Figure 1-1

Other books

Mujeres estupendas by Libertad Morán

Bared to Him by Jan Springer

Once Upon a Scandal by Julie Lemense

Not Just an Orgy by Sally Painter

Go Tell the Bees That I Am Gone by Diana Gabaldon

Thirsty by M. T. Anderson

The Dog Says How by Kevin Kling

Nightmare Ink by Marcella Burnard

Reykjavik Nights by Arnaldur Indridason

Glass Towers: Surrendered by Adler, Holt, Ginger Fraser