XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (110 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
12.38Mb size Format: txt, pdf, ePub
xs:ID
The value of an
ID
can be any valid
NCName
, but it is constrained to be unique among all the
ID
values in a document.
xs:IDREF
The value of an
IDREF
can be any valid
NCName
, but it is constrained to be the same as some
ID
value somewhere in the same document.
xs:ENTITY
The value of an
ENTITY
can be any valid
NCName
, but it is constrained to the same as the name of an unparsed entity defined in the DTD.

XPath 2.0 doesn't handle any of these types specially; it just treats them as strings. If you try to cast a value to one of these types, it will first apply the whitespace rules for that type, and it will then check that the value conforms to the rules for the type. (This means for example, that calling
xs:token($s)
has pretty well the same effect as calling
normalize-space($s)
; the only difference is that in the first case, you end up with a value labeled as an
xs:token
, and in the second case, it is labeled
xs:string
.)

Confusingly, the
normalize-space()
function (which is carried forward from XPath 1.0 and is described in Chapter 13 of this book),
collapses
whitespace, while the
xs:normalizedString
type in XML Schema
replaces
whitespace.

The special validation rules for
xs:ID
,
xs:IDREF
, and
xs:ENTITY
are not invoked when you create atomic values of these types, as they only make sense in the context of validating an entire document.

This concludes our tour of the built-in atomic types defined in XML Schema. Before finishing, we need to look at the special type
xs:untypedAtomic
, and at the three list types
xs:NMTOKENS
,
xs:IDREFS
, and
xs:ENTITIES
.

Untyped Atomic Values

It might seem perverse to have a type called
xs:untypedAtomic
, but that's the way it is. This isn't a type defined by XML Schema, it is a type used to label data that hasn't been validated against an XML Schema.

XML is a technology whose unique strength is its ability to handle everything from completely unstructured data, through semi-structured data, to data that has a completely rigid and formal structure. XPath needs to work with XML documents that fit anywhere in this spectrum. Indeed, it's not unusual to find documents where one part is rigidly structured and another is completely free-form.

One way of handling this would be to say that everything that isn't known to have a specific type is simply labeled as a string. But to enable more accurate type checking of expressions and queries, the language designers wanted to be more precise than this, and to distinguish data that's known to be a string because it has been validated against a schema, from data that's handled as a string because we don't know any better.

The value space of
xs:untypedAtomic
is the same as that of
xs:string
; in other words, any sequence of Unicode characters permitted in XML can be held as an
xs:untypedAtomic
value. So in terms of the values they can represent, there's no difference between
xs:untypedAtomic
and
xs:string
. The difference is in how the values can be used.

xs:untypedAtomic
is a chameleon type: it takes its behavior from the context in which it is used. If you use it where a number is expected, it behaves like a number; if you use it where a date is expected, it behaves like a date, and so on. This can cause errors, of course. If the actual value held in the
xs:untypedAtomic
value isn't a valid date, then using it as a date will fail.

In XPath 1.0, all data extracted from a source document was untyped in this sense. In some ways this makes life easy for the programmer, it means that you can do things like
@value + 2
without worrying about whether
@value
is a number or a string. But occasionally, this freedom can lead to confusion. For example, in XPath 1.0,
boolean(@value)
tests whether the
value
attribute exists;
boolean(string(@value))
tests whether it exists and is not an empty string, while
boolean(number(@value))
tests whether it exists and has a numeric value that is not zero. To make these kind of distinctions, you need to understand the differences between types.

Other books

Rise of the Dead by Dyson, Jeremy
Circle of Treason by Sandra V. Grimes
1. That's What Friends Are For by Annette Broadrick
Stage Fright (Nancy Drew/Hardy Boys Book 6) by Carolyn Keene, Franklin W. Dixon
The Black Room by Gillian Cross
Double Exposure by Michael Lister
Hurricane Kiss by Deborah Blumenthal