Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
See Also
insert-before()
on page 810
remove()
on page 861
Filter Expressions
on page 637 in Chapter 10
substring
The
substring()
function returns part of a string value, determined by character positions within the string. Character positions are counted from one.
For example, the expression
substring(‘Goldfarb’, 5, 3)
returns the string
far
.
Changes in 2.0
None.
Signature
Argument | Type | Meaning |
input | xs:string? | The containing string. If an empty sequence is supplied, the result is a zero-length string. |
start | xs:double | The position in the containing string of the first character to be included in the result string. |
length (optional) | xs:double | The number of characters to be included in the result string. If the argument is omitted, characters are taken from the start position up to the end of the containing string. |
Result | xs:string | The required substring of the containing string . |
Effect
Informally, the function returns a string consisting of the characters in the
input
string starting at position
start
; if a
length
is given, the returned string contains this many characters; otherwise, it contains all characters up to the end of the
value
.
Characters within a string are numbered 1, 2, 3… ,
n
. This will be familiar to Visual Basic programmers but not to those accustomed to C or Java, where numbering starts at zero.
Characters are counted as instances of the XML
Char
production. This means that a Unicode surrogate pair (a pair of 16-bit values used to represent a Unicode character in the range #x10000 to #x10FFFF) is treated as a single character.
Combining and non-spacing characters are counted individually, unless the implementation has normalized them into a single combined character. The implementation is allowed to turn strings into Unicode normalized form, but is not required to do so. In normalized form NFC, accents and diacritics will typically be merged with the letter that they modify into a single character.
It is possible to define this function in terms of the
subsequence()
function. With two arguments, the function has the same result as:
codepoints-to-string(
subsequence(string-to-codepoints($input), $start)))
With three arguments, the definition becomes:
codepoints-to-string(
subsequence(string-to-codepoints($input), $start, $length)))
These rules cover conditions such as the start or length being negative, NaN, fractional, or infinite. The comparisons and arithmetic are done using IEEE 754 arithmetic, which has some interesting consequences if values such as infinity and NaN, or indeed any non-integer values are used. The rules for IEEE 754 arithmetic are summarized in Chapter 2.
The equivalence tells us that if the
start
argument is less than one, the result always starts at the first character of the supplied string, while if it is greater than the length of the string, the result will always be an empty string. If the
length
argument is less than zero, it is treated as zero, and again an empty string is returned. If the
length
argument is greater than the number of available characters, and the start position is within the string, then characters will be returned up to the end of the containing string.
Examples
Expression | Result |
substring(“abcde”, 2) | “bcde” |
substring(“abcde”, 2, 2) | “bc” |
substring(“abcde”, 10, 2) | “” |
substring(“abcde”, 1, 20) | “abcde” |
Usage
The
substring()
function is useful when processing a string character-by-character. One common usage is to determine the first character of a string:
substring($filename, 1, 1)
Or when manipulating personal names in the conventional American format of first name, middle initial, last name:
string-join((first-name, substring(middle-name, 1, 1), last-name), “ ”)
The following example extracts the last four characters in a string:
substring($s, string-length($s)-3)
Using substring() as a Conditional Expression
The technique outlined in this section is thankfully obsolete, now that XPath 2.0 offers
if
expressions, as described in Chapter 7. But you may encounter it in XSLT 1.0 stylesheets, and you may still have to use it if you write code that has to run under both XPath 1.0 and XPath 2.0, so it's worth a mention here.
Suppose that
$b
is an
xs:boolean
value, and consider the following expression:
substring(“xyz”, 1, $b * string-length(“xyz”))
Under XPath 1.0 rules, the
xs:boolean
$b
when used in an arithmetic expression is converted to a number:
0
for false,
1
for true. So the value of the third argument is
0
if
$b
is false,
3
if
$b
is true. The final result of the
substring()
function is therefore a zero-length string if
$b
is false, or the string
“xyz”
if
$b
is true. The expression is equivalent to
if ($b)
then
“xyz”
else
“”
in XPath 2.0.
In fact the third argument doesn't need to be exactly equal to the string length for this to work, it can be any value greater than the string length. So you could equally well write: