XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (546 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
9.5Mb size Format: txt, pdf, ePub

See Also

base-uri()
on page 719

collection()
on page 726

document-uri()
on page 764

resolve-uri()
on page 867

document()
in the next entry

document

This function is available in XSLT only
.

The
document()
function finds an external XML document by resolving a URI reference, parses the XML into a tree structure, and returns its root node. It may also be used to find a set of external documents, and it may be used to find a node other than the root by using a fragment identifier in the URI.

For example, the expression
document(‘data.xml’)
looks for the file
data.xml
in the same directory as the stylesheet, parses it, and returns the root node of the resulting tree.

Changes in 2.0

This function is retained largely for backward compatibility with XSLT 1.0; a simplified version is now available in XPath 2.0 as the
doc()
function described on page 750.

The specification of the function has been generalized to allow the first argument to be an arbitrary sequence of URIs, and it has also become less prescriptive, to allow greater freedom to configure the way in which the URI is interpreted and the way in which the retrieved documents are parsed.

Signature

Argument
Type
Meaning
href
item()*
A sequence, which may contain values of type
xs:string
or
xs:anyURI
, or nodes containing such values. These URIs are used to locate the documents to be loaded.
base
(optional)
node()
If the argument is present, it must be a node. The base URI of this node is used for resolving any relative URIs found in the first argument.
Result
node()*
A sequence of nodes, in document order. In the common case where a single URI is specified, and this URI contains no fragment identifier, the result will normally be a single document node
.

Effect

In brief, the
document()
function locates an XML document, using a URI. The resulting XML document is parsed and a tree is constructed. On completion, the result of the
document()
function is the document node of the new document.

If a sequence of URIs is provided, rather than a single URI, then the result is a sequence of document nodes. If a URI contains a fragment identifier, then the result may be an element node rather than a document node. The details are described in the following sections.

I will describe the effect of the function by considering the different ways of determining a base URI to use for resolving relative URIs. However, first a word about URIs and URLs, which are terms I use rather freely throughout this section.

Resolving the URI

The XSLT specification always uses the term URI: Uniform Resource Identifier. The concept of a URI is a generalization of the URLs (Uniform Resource Locators) that are widely used on the Web today and displayed on every cornflakes packet. The URI extends the URL mechanism, which is based on the established Domain Name System (with its hierarchic names such as
www.ibm.com
and
www.cam.ac.uk
), to allow other global naming and numbering schemes, including established ones such as ISBN book numbers and international telephone numbers. While URIs are a nice idea, the only ones that really enable you to retrieve resources on the Web are the familiar URLs. This is why the terms URI and URL seem to be used rather interchangeably in this section and indeed throughout the book. If you read carefully, though, you'll see that I've tried to use both terms correctly.

The way URIs are used to locate XML documents, and the way these XML documents are parsed to create a tree representation, are not defined in detail. In fact, the XSLT
document()
function defines this process in terms of the XPath
doc()
function, and the XPath
doc()
function essentially says that it's a piece of magic performed by the context of the XPath expression, not by the XPath processor itself. This reflects the reality that when you are using an application programming interface (API) such as the Java JAXP interface or the
System.Xml.Xsl
class in Microsoft's .NET, you can supply your own code that maps URIs to document nodes in any way you like. (The relevant class is called
URIResolver
in JAXP,
XmlResolver
in .NET.) This might not even involve any parsing of a real XML file; for example, the
URIResolver
might actually retrieve data from a relational database, and return an XML document that encapsulates the results of the query.

There's an expectation, though, that most XSLT processors—unless running in some kind of secure environment—will allow you to specify a URL (typically one that starts
http:
or
file:
) that can be dereferenced in the usual way to locate a source XML document, which is then parsed. The details of how it is parsed, for example whether schema or Document Type Definition (DTD) validation is attempted and whether XInclude processing is performed, are likely to depend on configuration settings (perhaps options on the command line, or properties set via the processor's API). The language specification leaves this open-ended.

A URI used as input to the
document()
function should generally identify an XML document. If the URI is invalid, or if it doesn't identify any resource, or if that resource is not an XML document, the specification leaves it up to the implementation to decide what to do: it can either report the error, or ignore that particular URI. Implementations may go beyond this; for example, if the URI identifies an HTML document, they may attempt to convert the HTML to XML—this is all outside the scope of the W3 C specifications.

A URI can be relative rather than absolute. A typical example of a relative URI is
data.xml
. Such a URI is resolved (converted to an absolute, globally unique URI) by interpreting it as relative to some base URI. By default, a relative URI that appears in the text of an XML document is interpreted relative to the URI of the document (or more precisely, the XML entity) that contains it, which in the case of the
document()
function is usually either the source document or the stylesheet. So if the relative URI
data.xml
appears in the source document, the system will try to find the file in the same directory as the source document, while if it appears in the stylesheet, the system will look in the directory containing the stylesheet. The base URI of a node in an XML document can be changed using the
xml:base
attribute, and this will be taken into account. In addition, the
document()
function provides a second argument so that the base URI can be specified explicitly, if required.

The actual rule is that the
href
argument may be a sequence of nodes or atomic values. In the case of a node in this sequence, the node may contain a URI (or indeed, a sequence of URIs), and if such a URI is relative then it is expanded against the base URI of the node from which it came. In the case of an atomic value in the sequence, this must be an
xs:string
or
xs:anyURI
value, and it is expanded using the base URI of the stylesheet.

The expansion of relative URIs exploits the fact that in the XPath data model, described on page 45 in Chapter 2, every node has a base URI. (Don't confuse this with the namespace URI, which is quite unrelated.) By default, the base URI of a node in the source document or the stylesheet will be the URI of the XML document or entity from which the node was constructed. In some cases, for example when the input comes from a Document Object Model (DOM) document or from a relational database, it may be difficult for the processor to determine the base URI (the concept does not exist in the DOM standard). What happens in this situation is implementation-defined. Microsoft, whose MSXML3 processor is built around its DOM implementation, has extended its DOM so it retains knowledge of the URI from which the document was loaded.

Other books

Robin Hood by David B. Coe
Gossamer Wing by Delphine Dryden
Activate by Crystal Perkins
The Unknown Woman by Laurie Paige