Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
In the previous edition of this book, I presented a different way of doing the GEDCOM-to-XML conversion which may still be of interest, so the necessary files are included in the downloads for this chapter. Instead of coding the logic in XSLT, I wrote it in Java. The Java code implements the Java interface
javax.xml.parsers.XMLReader
, which makes it look just like an XML parser, enabling it to feed data into an XSLT stylesheet exactly in the same way that a real XML parser does. The SAX2 parser for GEDCOM 5.5 is supplied with the sample files for this chapter on the Wrox Web site; it is named
GedcomParser
, and you can use it to process the input for a transformation using the
-x
flag on the Saxon command line. For completeness, to allow conversions in the reverse direction, I've also provided a SAX2
ContentHandler
that accepts an XML result tree in the form of a sequence of SAX events, and serializes it in the GEDCOM notation. Along with these two classes are an
AnselInputStreamReader
and
AnselOutputStreamWriter
that handle the unusual character set used by GEDCOM.
Converting from GEDCOM 5.5 to 6.0
It's now time to look at the second stylesheet in the pipeline, which converts the raw XML obtained by mechanical conversion of the GEDCOM format into XML that conforms to the target GEDCOM 6.0 XML Schema. The stylesheet
ged55-to-6.xsl
doesn't handle the full job of GEDCOM conversion, but it does handle the subset that we're using in this demonstration. It starts like this:
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”
xmlns:xs=“http://www.w3.org/2001/XMLSchema”
version=“2.0”
>
I'm going to use a schema-aware stylesheet to tackle this conversion. I won't be using a schema for the input vocabulary (because I haven't written one), but I will be using the schema for the result document. I will also be validating the result document against this schema. The most noticeable effect of this is that mistakes in the stylesheet that cause incorrect output to be generated are reported immediately, and pinpointed to the line in the stylesheet that caused the error. As I developed this stylesheet, this happened dozens of times before I got it right, and diagnosing the errors proved far easier than using the conventional approach of generating the output, inspecting it for obvious faults, and then running it through a separate validation phase. I'll give some examples of this later on.
To run this example yourself, you will therefore need to install a schema-aware processor. At the time of writing, the the two candidates are Saxon-SA and the Altova XSLT 2.0 processor. Alternatively, because the only use of schema-aware processing is to validate the output, you can edit the stylesheet to remove the
validation=“strict”
attribute on the
There is no
namespace
attribute on the
Top-Level Processing
We can now get on with the top-level processing logic:
Date=“{f:today()}”/>
This template rule establishes the outline of the result tree. The containing
Submitter
is historic: the original purpose of GEDCOM was to allow members of the LDS church to submit details of their ancestors to the church authorities.)
The instruction
In the header I have generated only those fields that are mandatory. These include the file creation date, which must be in the format
DD
MMM
YYYY
. I generated this using the user-defined function: