Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
In the original GEDCOM file a NOTE can contain multiple lines, which are arranged like this:
1 NOTE Educated at Harvard University. Elected Congressman in 1945
2 CONT aged 29; served three terms in the House of Representatives.
2 CONT Elected Senator in 1952. Elected President in 1960, the
2 CONT youngest ever President of the United States.
In the direct conversion to XML, the note appears like this (except that there is no newline before the first
The GEDCOM 6.0 specification allows only plain text in a
ged55-to-6
conversion stylesheet preserves the line endings by inserting a newline character wherever a
aged 29; served three terms in the House of Representatives.
Elected Senator in 1952. Elected President in 1960, the
youngest ever President of the United States.
The result isn't always satisfactory, because different genealogy packages that produce GEDCOM 5.5 vary widely in how they handle newlines and whitespace, but it works in this case.
A typical individual record after conversion looks like this:
GEDCOM 6.0 allows all the parts of an individual's name to be tagged indicating the type of the name, but it doesn't require it, and in our source data, there isn't enough information to achieve this. The
Type
attribute distinguishes them.
Creating Event Records
The event records in the result tree correspond to events associated with individuals and families in the source data. As we've seen, the 6.0 data model treats events as first-class objects, which are linked to the individuals who participated in the event.
Our sample data set only includes a few different kinds of event: birth, marriage, divorce, death, and burial, and in the stylesheet we'll confine ourselves to handling these five, plus the other common event of baptism. We also handle the general
EVEN
tag, which is used in GEDCOM 5.5 for miscellaneous events. It should be obvious how the code can be extended to handle other events.
select=“/GED/INDI/(BIRT|BAPM|DEAT|BURI) | /GED/FAM/(MARR|DIV)” />
This code identifies all the subelements of
Type
attribute, whose values are completely open-ended. The optional
VitalType
attribute allows each event to be associated with one of the four key events of birth, death, marriage, and divorce: this means, for example, that the date of publication of an obituary can be used as an approximation for the date of death if no more accurate date is available, and that the announcement of banns can similarly be used to estimate the date of marriage.
The next two templates are used to generate the participants in an event. The first handles events associated with an individual, the second events associated with a couple (which come from the
FAM
record):
This leaves the handling of the date and place of the event. Both are potentially very complex information items. Dates, however, have changed little between GEDCOM 5.5 and 6.0, so they can be carried over unchanged.
For place names, we can try to be a bit more clever. Many of the events in our data set occurred in the United States, and have a
PLAC
record of the form
somewhere,
XX,
USA
, where
XX
is a two-letter code identifying a state. This format is predictable because
The Master Genealogist
captures place names in a structured way and generates this comma-separated format on output. We can recognize place names that follow this pattern, and use the regular-expression-handling capability of XSLT 2.0 to generate a more structured
reverse()
and
tokenize()
—using individual