Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Expression | Possible Result |
distinct-values((1, 2, 3, 3.5, 2.0, 1.0)) | 3.5, 2.0, 1, 3 |
distinct-values((“A”, “B”, “C”, “a”, “b”, “c”)) | “B“, “c“, “a” |
distinct-values((xs:time(“12:20:02Z”), xs:time(“13:20:02+01:00”))) | xs:time(“13:20:02+01:00”) |
distinct-values((1, “a”, current-date())) | “a”, 1, 2008-05-08Z |
Usage
The
distinct-values()
function provides the only direct way of eliminating duplicate values in XPath 2.0 and in XQuery 1.0. In XSLT 2.0, however, richer functionality is available in the form of the
If you apply the function to a sequence of nodes, the result will be the distinct values present in those nodes, not the nodes themselves. To process the nodes, you will have to find the nodes having each value. The typical logic is the following, which returns a sequence of integers representing the number of employees in each department:
for $x in distinct-values(//employee/@dept)
return count(//employee[@dept = $x])
In practice the processing of the result will probably be done in XSLT, XQuery, or some other host language, because it will usually involve generating nodes in the output, which XPath cannot do on its own.
Having found the distinct values that appear in a sequence, it is possible to determine the positions of each of these values using the
index-of()
function. For example, if you are using XQuery, then you can sort the distinct values in order of their first appearance in the sequence by writing:
(: XQUERY 1.0 EXAMPLE :)
for $d in distinct-values($sequence)
order by index-of($sequence, $d)[1]
return $d
Alternatively, you could sort them in order of their frequency of occurrence by writing:
(: XQUERY 1.0 EXAMPLE :)
for $d in distinct-values($sequence)
order by count(index-of($sequence, $d))
return $d
XPath 2.0 has no sorting capability, so this operation can only be done in the host language. In XSLT, it is usually more convenient to use the
See Also
index-of()
on page 807
doc, doc-available
The
doc()
function retrieves an external XML document by means of a URI, and returns the document node at the root of the tree representation of that XML document. Its companion function
doc-available()
determines whether an equivalent call on the
doc()
function would succeed in locating a document.
Changes in 2.0
These functions are new in XPath 2.0. The
doc()
function is a simplified version of the
document()
function that was provided in XSLT 1.0 and which remains available in XSLT 2.0 (see page 754). When combined with functions such as
resolve-uri()
and
base-uri()
, the
doc()
function provides most of the capability of the XSLT 2.0
document()
function, but with a much simpler interface.
Signatures
The doc() function
Argument | Type | Meaning |
uri | xs:string? | The URI of the document to be loaded |
Result | document-node()? | The document node of the document identified by this URI |
The doc-available() function
Argument | Type | Meaning |
uri | xs:string? | The URI of the document to be loaded |
Result | xs:boolean | True if a call on the doc() function with the same argument would succeed; false if it would fail |
Effect
The
doc()
function gives XPath a window on the outside world, by allowing it to retrieve documents identified by a URI. Potentially this makes any XML document anywhere on the Web available for processing.
However, because the
doc()
function is an interface between the XPath processor and the world outside, many aspects of its behavior depend on the implementation, or on the way that the implementation is configured. XPath 2.0 is expected to be used in a great variety of environments (for example, some XPath processors might only work with XML documents that have been preloaded into a purpose-designed database) and the spec therefore gives a great deal of freedom to implementors. In fact, the formal specification of this function simply says that the evaluation context for processing an XPath expression provides a mapping of URIs to document nodes; if you specify a URI for which a mapping exists, then you get back the corresponding document node, and if you specify a URI for which no mapping exists, you get an error.
The term
mapping
here is deliberately abstract. It actually allows the implementation to do anything it likes to get from the URI you specify to the tree that comes back. Many implementations will allow users to control the process, either by implementing user hooks like the
URIResolver
in Java's JAXP interface and the
XmlResolver
in .NET, or by setting options in configuration files or command line parameters.
Before the URI is used, it is first resolved into an absolute URI. You can resolve the URI yourself using the
resolve-uri()
function, in which case you have a free choice of the base URI to use, but if you pass a relative URI to the
doc()
function then it will always be resolved against the base URI from the static context of the XPath expression. In XSLT 2.0 this generally means the URI of the containing stylesheet module; in XQuery it means the base URI given in the query prolog. If the relative URI was read from a source document, then it should normally be resolved against the base URI of the document from where it was read, but this is left to the application to do.
One rule that the implementation must enforce is that if you call
doc()
twice with the same absolute URI, you get the same document node back each time. In XSLT, this rule applies for the duration of a transformation, not just for a single XPath expression evaluation.
What is likely to happen in a typical implementation is this:
Many processors are likely to allow users to control aspects of this process, including:
If a schema is used to validate the document, then it must be compatible with any schema that was used when compiling the XPath expression. Here again, the detailed rules have been left to the implementation. The processor may require that the input document is validated against a schema that was known at compile time, or it may allow validation using a different schema, provided that the tree that comes back contains enough information to allow the type definitions to be located at runtime. The processor is supposed to ensure that there is no version incompatibility between the compile time and runtime schemas, but it wouldn't be surprising to come across a processor that simply passes this responsibility back to the user.
The
doc-available()
function works exactly the same way as the
doc()
function, except that instead of returning a document node when a document can be loaded and throwing an error when it can't,
doc-available()
returns true in the first case and false in the second. In the absence of any try/catch capability either in XSLT or XPath, this allows you to test for errors before they occur, so that processing can continue when the required document does not exist or has invalid content.
Usage and Examples
There are three main ways an XPath expression can access nodes in input documents.
Which of these three approaches is used is a matter of application convenience, and may be influenced by the facilities available in the host language or the processor API for configuring the behavior of the different options.
The following example shows an expression that uses a look-up table in an external document. The look-up table might have the form shown below and be held in a document called
countries.xml
:
…
A query that uses this table to display the number of employees located in each country might look like this:
string-join(
for $c in doc(“countries.xml”)/country return
concat($c/@name, “: ”,
count(//employee[location/country = $c/@code]))
“
”)
This will return a string of the form:
Andorra: 0
United Arab Emirates: 12
Afghanistan: 1
Antigua and Barbuda: 25
…
If you want to process a document if and only if it actually exists, you can use logic of the form
You should be aware of a few points: