Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Most of these fields are optional and repeatable. Something I haven't captured in this schema is that the GEDCOM spec also says the structure is extensible; arbitrary namespaced elements may be inserted at any point in the structure. This is typically used to contain information specific to a particular product vendor, so that GEDCOM can be used to exchange data between users of that product with no loss of information. This can be handled in XML Schema by using wildcards, but only if they appear after other elements (this restriction disappears in XML Schema 1.1).
The Stylus Studio converter makes
IndividualRec
and all other elements into top-level element declarations in the schema. This isn't needed for validation, since in a GEDCOM file the
IndividualRec
will always be a child of the
.
Having made
IndividualRec
a top-level element declaration, there seems to be nothing that would be gained by naming its complex type as a top-level type definition. In general, the only types that are worth naming as top-level types are those that are used in more than one place, or at least look likely to be used in more than one place.
For the child elements of
IndividualRec
, the converter chose to use a global element declaration referring to a local (anonymous) type. There's nothing absolute about this; one could equally use a local element with a global type. As far as validation is concerned, you could also use a local element with an anonymous type, but this is not a good idea if you want to reference the schema from a stylesheet. When it comes to writing an XSLT stylesheet, it's important that where a data element such as
Date
appears in several places, it should either use a global element declaration or a global type definition, so that you can reference one or the other when you declare variables and parameters, and when you write match patterns.
There are no substitution groups in this model. They aren't needed, because the model has chosen to use generic elements like
Events
An event record has this structure:
The
Religion
element, of course, has a special place because so many of the events affecting our forebears were recorded by the religious authorities.
Families
The third object type we will look at is the
family
. Here is the definition:
Again, many of the fields are common with the other two object types. The elements
HusbFath
,
WifeMoth
, and
Child
play a crucial role in linking the data, so we'd better open them up:
A
identifies the individual concerned. The
The
Now let's look quickly at the three most common (and difficult) datatypes used for properties of these objects: dates, places, and personal names.
Dates
As we've seen, GEDCOM allows any character string to be used as a date. However, much of the presentation of data depends on analyzing dates wherever possible. How is this dilemma resolved?
The
Date
element referenced from the
Event
record has a complex type, defined like this:
That is to say, it is a complex type with simple content: the content is a
GeneralDate
, and the optional attribute indicates which calendar is used. The
GeneralDate
can be any character string, but certain formats such as
DD
MMM
YYYY
are recommended.
As far as validation is concerned, there isn't much point in defining a schema type for the pattern
DD
MMM
YYYY
. However, it turns out that it can be useful to define this type even if it isn't used for validation. We can define the GEDCOM date format as a union type like this:
“[0-9]?[0-9]\s(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s[0-9]{4}”/>
This type is meaningless from the point of view of validation—all strings will be considered valid. But the effect is that a date that conforms to the
DD
MMM
YYYY
pattern will be labeled as a
StandardDate
, while one that doesn't will be labeled only as an
xs:string
. This will prove useful when we write our stylesheets, because it becomes very easy to separate standard dates from nonstandard dates when we want to perform operations like date formatting and sorting. In fact, I could have usefully split dates into three categories: simple exact dates like
4
MAR
1920
; inexact dates that conform to the GEDCOM syntax, such as
BEF
JAN
1866
(meaning some time before January 1866); and arbitrary character strings whose interpretation is left purely to the reader.