Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
With XSLT 2.0, you can override the default rules for establishing the base URI of a node by using the
xml:base
attribute of an element. This attribute is defined in a W3 C Recommendation called
XML Base
(
http://www.w3.org/TR/xmlbase/
); it is intended to fulfill the same function as the
element in HTML. If an element has an
xml:base
attribute, the value of the attribute must be a URI, and this URI defines the base URI for the element itself and for all descendants of the element node, unless overridden by another
xml:base
attribute.
The URI specified in
xml:base
may itself be a relative URI, in which case it is resolved relative to the base URI of the parent of the element containing the
xml:base
attribute (that is, the URI that would have been the base URI of the element if it hadn't had an
xml:base
attribute).
With XSLT 2.0, it is also possible that the node used to establish the base URI for the
document()
function will be a node in a temporary tree created as the value of a variable. Normally, the base URI for such a node will be the base URI of the
xml:base
attribute, that defines the base URI in the same way as for a source document.
If several calls on the
document()
function use the same URI (after expansion of a relative URI into an absolute URI), then the same document node is returned each time. You can tell that it's the same node because the
is
operator returns
true
:
document(‘a.xml’)
is document(‘a.xml’)
will always be true. If you use a different URI in two calls, then you may or may not get the same document node back:
document(‘a.xml’) is document(‘A.XML’)
might be either true or false.
A fragment identifier identifies a part of a resource: for example, in the URL
http://www.wrox.com/booklist#april2008
, the fragment identifier is
april2008
. In principle, a fragment identifier allows the URI to reference a node or set of nodes other than the root node of the target document; for example, the fragment identifier could be an XPointer expression containing a complex expression to select nodes within the target document. In practice though, this is all implementation defined. The interpretation of a fragment identifier depends on the media type (often called MIME type) of the returned document. Implementations are not required to support any particular media types (which means they are not required to support fragment identifiers at all). Many products support a simple fragment identifier consisting of a name that must be the value of an
ID
attribute in the target document, and support for XPointer fragment identifiers is likely to become increasingly common now that a usable XPointer specification has finally been ratified.
Parsing the Document
Once the URI has been resolved against a base URI, the next steps are to fetch the XML document found at that URI, and then to parse it into a tree representation. The specification says very little about these processes, which allows the implementation considerable freedom to configure what kind of URLs are acceptable, and how the parsing is done. It is not even required that the resource starts life as XML: an implementation could quite legitimately return a document node that represents an HTML document, or the results of a database query. If the URL does refer to an XML file, there are still variations allowed in how it is parsed; for example, whether DTD or schema validation takes place, and whether XInclude references are expanded. A vendor might provide additional options such as the ability to strip comments, processing instructions, and unreferenced namespaces. You need to check the documentation for your product to see how such factors can be controlled.
The specification does say that whitespace-only nodes are stripped following the same rules as for the source document, based on the
URIs Held in Nodes
For a simple case such as
document(@href)
, the result is a single node, namely the root node of the document referenced by the
href
attribute of the context node.
More generally, the argument may be a sequence of nodes, each of which contains a sequence of URIs. The result is then the sequence obtained by processing each of these in turn. For example,
document(//@href)
returns the sequence of documents located by dereferencing the URIs in all the
href
attributes in the original context document. The result is returned in document order of the returned nodes (a somewhat academic concept since they will usually be different documents). The result is not necessarily in the order of the
href
attributes, and duplicates will be eliminated.