XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (722 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
8.15Mb size Format: txt, pdf, ePub

In the previous edition of this book, I presented a different way of doing the GEDCOM-to-XML conversion which may still be of interest, so the necessary files are included in the downloads for this chapter. Instead of coding the logic in XSLT, I wrote it in Java. The Java code implements the Java interface
javax.xml.parsers.XMLReader
, which makes it look just like an XML parser, enabling it to feed data into an XSLT stylesheet exactly in the same way that a real XML parser does. The SAX2 parser for GEDCOM 5.5 is supplied with the sample files for this chapter on the Wrox Web site; it is named
GedcomParser
, and you can use it to process the input for a transformation using the
-x
flag on the Saxon command line. For completeness, to allow conversions in the reverse direction, I've also provided a SAX2
ContentHandler
that accepts an XML result tree in the form of a sequence of SAX events, and serializes it in the GEDCOM notation. Along with these two classes are an
AnselInputStreamReader
and
AnselOutputStreamWriter
that handle the unusual character set used by GEDCOM.

Converting from GEDCOM 5.5 to 6.0

It's now time to look at the second stylesheet in the pipeline, which converts the raw XML obtained by mechanical conversion of the GEDCOM format into XML that conforms to the target GEDCOM 6.0 XML Schema. The stylesheet
ged55-to-6.xsl
doesn't handle the full job of GEDCOM conversion, but it does handle the subset that we're using in this demonstration. It starts like this:

  xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”

  xmlns:xs=“http://www.w3.org/2001/XMLSchema”

  version=“2.0”

>






I'm going to use a schema-aware stylesheet to tackle this conversion. I won't be using a schema for the input vocabulary (because I haven't written one), but I will be using the schema for the result document. I will also be validating the result document against this schema. The most noticeable effect of this is that mistakes in the stylesheet that cause incorrect output to be generated are reported immediately, and pinpointed to the line in the stylesheet that caused the error. As I developed this stylesheet, this happened dozens of times before I got it right, and diagnosing the errors proved far easier than using the conventional approach of generating the output, inspecting it for obvious faults, and then running it through a separate validation phase. I'll give some examples of this later on.

To run this example yourself, you will therefore need to install a schema-aware processor. At the time of writing, the the two candidates are Saxon-SA and the Altova XSLT 2.0 processor. Alternatively, because the only use of schema-aware processing is to validate the output, you can edit the stylesheet to remove the

declaration and the
validation=“strict”
attribute on the

instruction, and it will then work with a basic XSLT 2.0 processor. However, later stylesheets in this chapter make rather deeper use of schema-aware transformation.

There is no
namespace
attribute on the

declaration, because the schema has no target namespace.

Top-Level Processing

We can now get on with the top-level processing logic:



  

     

        

          

              Date=“{f:today()}”/>

          

             

          

        

        

        

        

        

          

        

     

  


This template rule establishes the outline of the result tree. The containing

element will contain: a header record, which we generate here and now; then a set of family records, a set of individual records, and a set of events, which must appear in that order; and finally a contact record to indicate the originator of the data set, which must be present because the mandatory

element in the header refers to it. The name of the submitter is defined by a stylesheet parameter, so you can set a different value if you use this stylesheet on your own data files. (The reason this field is called
Submitter
is historic: the original purpose of GEDCOM was to allow members of the LDS church to submit details of their ancestors to the church authorities.)

The instruction

causes the result tree to be validated. The system will do this by looking in the imported schemas for an element declaration of the outermost element in the result tree (the

element) and then ensuring that the rest of the result tree conforms to this element declaration. In the case of Saxon, this validation is done on the fly: each element is validated as soon as it is written to the result tree, which means that any validation errors can be reported in relation to the stylesheet instruction that wrote the incorrect data.

In the header I have generated only those fields that are mandatory. These include the file creation date, which must be in the format
DD
MMM
YYYY
. I generated this using the user-defined function:

Other books

Now You See Me by Jean Bedford
The Rescue by Everette Morgan
Autumn Laing by Alex Miller
Kolchak's Gold by Brian Garfield
Stark by Ben Elton
El perro canelo by Georges Simenon
Lyra's Oxford by Philip Pullman