XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (621 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
11.75Mb size Format: txt, pdf, ePub

See Also

insert-before()
on page 810

remove()
on page 861

Filter Expressions
on page 637 in Chapter 10

substring

The
substring()
function returns part of a string value, determined by character positions within the string. Character positions are counted from one.

For example, the expression
substring(‘Goldfarb’, 5, 3)
returns the string
far
.

Changes in 2.0

None.

Signature

Argument
Type
Meaning
input
xs:string?
The containing string. If an empty sequence is supplied, the result is a zero-length string.
start
xs:double
The position in the containing string of the first character to be included in the result string.
length
(optional)
xs:double
The number of characters to be included in the result string. If the argument is omitted, characters are taken from the start position up to the end of the containing string.
Result
xs:string
The required substring of the containing string
.

Effect

Informally, the function returns a string consisting of the characters in the
input
string starting at position
start
; if a
length
is given, the returned string contains this many characters; otherwise, it contains all characters up to the end of the
value
.

Characters within a string are numbered 1, 2, 3… ,
n
. This will be familiar to Visual Basic programmers but not to those accustomed to C or Java, where numbering starts at zero.

Characters are counted as instances of the XML
Char
production. This means that a Unicode surrogate pair (a pair of 16-bit values used to represent a Unicode character in the range #x10000 to #x10FFFF) is treated as a single character.

Combining and non-spacing characters are counted individually, unless the implementation has normalized them into a single combined character. The implementation is allowed to turn strings into Unicode normalized form, but is not required to do so. In normalized form NFC, accents and diacritics will typically be merged with the letter that they modify into a single character.

It is possible to define this function in terms of the
subsequence()
function. With two arguments, the function has the same result as:

codepoints-to-string(

       subsequence(string-to-codepoints($input), $start)))

With three arguments, the definition becomes:

codepoints-to-string(

       subsequence(string-to-codepoints($input), $start, $length)))

These rules cover conditions such as the start or length being negative, NaN, fractional, or infinite. The comparisons and arithmetic are done using IEEE 754 arithmetic, which has some interesting consequences if values such as infinity and NaN, or indeed any non-integer values are used. The rules for IEEE 754 arithmetic are summarized in Chapter 2.

The equivalence tells us that if the
start
argument is less than one, the result always starts at the first character of the supplied string, while if it is greater than the length of the string, the result will always be an empty string. If the
length
argument is less than zero, it is treated as zero, and again an empty string is returned. If the
length
argument is greater than the number of available characters, and the start position is within the string, then characters will be returned up to the end of the containing string.

Examples

Expression
Result
substring(“abcde”, 2)
“bcde”
substring(“abcde”, 2, 2)
“bc”
substring(“abcde”, 10, 2)
“”
substring(“abcde”, 1, 20)
“abcde”

Usage

The
substring()
function is useful when processing a string character-by-character. One common usage is to determine the first character of a string:

substring($filename, 1, 1)

Or when manipulating personal names in the conventional American format of first name, middle initial, last name:

string-join((first-name, substring(middle-name, 1, 1), last-name), “ ”)

The following example extracts the last four characters in a string:

substring($s, string-length($s)-3)

Using substring() as a Conditional Expression

The technique outlined in this section is thankfully obsolete, now that XPath 2.0 offers
if
expressions, as described in Chapter 7. But you may encounter it in XSLT 1.0 stylesheets, and you may still have to use it if you write code that has to run under both XPath 1.0 and XPath 2.0, so it's worth a mention here.

Suppose that
$b
is an
xs:boolean
value, and consider the following expression:

substring(“xyz”, 1, $b * string-length(“xyz”))

Under XPath 1.0 rules, the
xs:boolean
$b
when used in an arithmetic expression is converted to a number:
0
for false,
1
for true. So the value of the third argument is
0
if
$b
is false,
3
if
$b
is true. The final result of the
substring()
function is therefore a zero-length string if
$b
is false, or the string
“xyz”
if
$b
is true. The expression is equivalent to
if ($b)
then
“xyz”
else
“”
in XPath 2.0.

In fact the third argument doesn't need to be exactly equal to the string length for this to work, it can be any value greater than the string length. So you could equally well write:

Other books

Kilo Class by Patrick Robinson
The Return by Nicole R. Taylor
Snapshot by Craig Robertson
Biting the Christmas Biscuit by Dawn Kimberly Johnson
Moriarty Returns a Letter by Michael Robertson
Laws of Attraction by Diana Duncan
The Devil’s Share by Wallace Stroby