Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
The first stages in whitespace handling are the job of the XML parser and are done long before the XSLT processor gets to see the data. Remember that these apply both to source documents and to stylesheets:
This attribute normalization can be significant when the attribute in question is an XPath expression in the stylesheet. For example, suppose you want to test whether a string value contains a newline character. You can write this as follows:
It's important to use the character reference
here, rather than a real newline, because a newline character would be converted to a space by the XML parser, and the expression would then actually test whether the supplied string contains a space.
What this means in practice is that if you want to be specific about whitespace characters, write them as character references; if you just want to use them as separators and padding, use the whitespace characters directly.
The XSLT specification assumes that the XML parser will hand over all whitespace text nodes to the XSLT processor. However, the input to the XSLT processor is technically a tree, and the XSLT specification claims no control over how this tree is built. If you use Microsoft's MSXML, or Altova's XSLT processor, then the default action of the parser while building the tree is to remove whitespace text nodes. If you want the parser to behave the way that the XSLT specification expects, you must set configuration options to make this happen; see the vendors' documentation for details.
Once the XML parser has done its work, further manipulation of whitespace may be done by the schema processor. This is more likely to affect source documents than stylesheets, because there is little point in putting a stylesheet through a schema processor. For each simple data type, XML Schema defines whitespace handling (the so-called whitespace facet) as one of three options:
When source documents are processed using a schema, the XDM rules say that for attributes, and for elements with simple content (that is, elements that can't have child elements), the typed value of the element or attribute is the value after whitespace normalization has been done according to the XML Schema rules for the particular data type. The
string
value of an element or attribute may either be the value as originally written or the value obtained by converting the typed value back to a string—implementations are allowed to choose either approach. In the latter case, insignificant leading and trailing whitespace may be lost. However, the
string()
function itself is almost the only thing that depends on the string value of a node; most expressions use the typed value.
Finally, the XSLT processor applies some processing of its own. By this time entity and character references have been expanded, so there is no difference between a space written as a space and one written as
: