XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (724 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
13.97Mb size Format: txt, pdf, ePub

In the original GEDCOM file a NOTE can contain multiple lines, which are arranged like this:

1 NOTE Educated at Harvard University. Elected Congressman in 1945

2 CONT aged 29; served three terms in the House of Representatives.

2 CONT Elected Senator in 1952. Elected President in 1960, the

2 CONT youngest ever President of the United States.

In the direct conversion to XML, the note appears like this (except that there is no newline before the first

start tag):

Educated at Harvard University. Elected Congressman in 1945

aged 29; served three terms in the House of Representatives.

Elected Senator in 1952. Elected President in 1960, the

youngest ever President of the United States.


The GEDCOM 6.0 specification allows only plain text in a

element (it provides other elements for more complex information, such as a transcript of a will). So the
ged55-to-6
conversion stylesheet preserves the line endings by inserting a newline character wherever a

element appeared. The final result is:

Educated at Harvard University. Elected Congressman in 1945

 aged 29; served three terms in the House of Representatives.

 Elected Senator in 1952. Elected President in 1960, the

 youngest ever President of the United States.


The result isn't always satisfactory, because different genealogy packages that produce GEDCOM 5.5 vary widely in how they handle newlines and whitespace, but it works in this case.

A typical individual record after conversion looks like this:


   Jaqueline Lee

      Bouvier

   

   

      Kennedy

    

   

      Onassis

   

   F

   

   


GEDCOM 6.0 allows all the parts of an individual's name to be tagged indicating the type of the name, but it doesn't require it, and in our source data, there isn't enough information to achieve this. The

allows external reference numbers to be recorded; for example, it might be a stable reference number used to identify this record in a particular database. As with names, there's no limit on how many reference numbers can be stored—the idea is that the
Type
attribute distinguishes them.

Creating Event Records

The event records in the result tree correspond to events associated with individuals and families in the source data. As we've seen, the 6.0 data model treats events as first-class objects, which are linked to the individuals who participated in the event.

Our sample data set only includes a few different kinds of event: birth, marriage, divorce, death, and burial, and in the stylesheet we'll confine ourselves to handling these five, plus the other common event of baptism. We also handle the general
EVEN
tag, which is used in GEDCOM 5.5 for miscellaneous events. It should be obvious how the code can be extended to handle other events.


  

     select=“/GED/INDI/(BIRT|BAPM|DEAT|BURI) | /GED/FAM/(MARR|DIV)” />

  



  

    

    -

    

  

  

    

    

    

  



  

  

  

  

  

  



  

    

    -

    

  

  

    

    

  


This code identifies all the subelements of

and

that refer to events, and then processes these, creating one

in the output for each. The identifier for the event is computed from the identifier of the containing

or

element plus a sequence number, and the attributes of the event are obtained from a look-up table based on the original element name. In the 6.0 model, the type of event (for example death or burial) is indicated by the
Type
attribute, whose values are completely open-ended. The optional
VitalType
attribute allows each event to be associated with one of the four key events of birth, death, marriage, and divorce: this means, for example, that the date of publication of an obituary can be used as an approximation for the date of death if no more accurate date is available, and that the announcement of banns can similarly be used to estimate the date of marriage.

The next two templates are used to generate the participants in an event. The first handles events associated with an individual, the second events associated with a couple (which come from the
FAM
record):


  

    

    principal

  



  

    

    husband

  

  

    

    wife

  


This leaves the handling of the date and place of the event. Both are potentially very complex information items. Dates, however, have changed little between GEDCOM 5.5 and 6.0, so they can be carried over unchanged.


  


For place names, we can try to be a bit more clever. Many of the events in our data set occurred in the United States, and have a
PLAC
record of the form
somewhere,
XX,
USA
, where
XX
is a two-letter code identifying a state. This format is predictable because
The Master Genealogist
captures place names in a structured way and generates this comma-separated format on output. We can recognize place names that follow this pattern, and use the regular-expression-handling capability of XSLT 2.0 to generate a more structured

attribute. This records the country as USA, and the state as the two-letter code preceding the country name; anything before the state abbreviation is tokenized using commas as the delimiter, and the sequence of tokens is output in reverse order—note the calls on
reverse()
and
tokenize()
—using individual

elements in the output.

Other books

The Anti-Prom by Abby McDonald
Here Comes Trouble by Donna Kauffman
Gone in a Flash by Susan Rogers Cooper
Five Stars: Five Outstanding Tales from the early days of Stupefying Stories by Aaron Starr, Guy Stewart, Rebecca Roland, David Landrum, Ryan Jones
Bogota Blessings by E. A. West
The Core of the Sun by Johanna Sinisalo