Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Choosing Characters to Map
Applications for character maps probably fall into two categories: those where you want to choose a nonstandard string representation of a character that occurs naturally in the data, and those where you want to choose some otherwise unused character to trigger some special effect in the output.
An example in the first category would be the example shown earlier:
This forces the nonbreaking space character to be output as an entity reference. If the document is to be edited, many people will find the entity reference easier to manipulate because it shows up as a visible character, whereas the nonbreaking space character itself appears on the screen just like an ordinary space.
An example in the second category is choosing two characters to represent the start and end of a comment. Suppose that the requirement is to transform an input document by “commenting out” any element that has the attribute
delete=“yes”
. By commenting out, I mean outputting something like:
This is tricky, because the result cannot be modeled naturally as a result tree—comment nodes cannot have element nodes as children. So we'll choose instead to output the
The best characters to choose for such purposes are the characters in the Unicode Private Use Area, for example the characters from xE000 to xF8FF. These characters have no defined meaning in Unicode, and are intended to be used for communications where there is a private agreement between the sender and the recipient as to what they mean. In this case, the sender is the stylesheet and the recipient is the serializer.
If you assign private use characters in information that is passed between applications, especially applications owned by different organizations, you should make sure that your use of the characters is well documented.
Here is a stylesheet that performs the required transformation:
Example: Using a Character Map to Comment-Out Elements
This example copies the input unchanged to the output, except that any element in the input that has the attribute
delete=“yes”
is output within a comment.
Stylesheet
The stylesheet is
comment-out.xsl
:
]>
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
Source
One of the paragraphs in the source file
resume.xml
is:
Aidan is also in demand as a consort singer,
performing with groups including the Oxford Camerata and the Sarum
Consort, with whom he has made several acclaimed recordings on the
ASV label of motets by Bach and Peter Philips sung by solo voices.
Output
When the stylesheet is applied to the source file
resume.xml
, the above paragraph appears as:
Limitations of Character Maps
A character map applies to a whole result document; you cannot switch character mapping on and off at will.
The character map must be fixed at compile time. You cannot compute the output string at runtime, and there is no way the process can be parameterized. (You can, however, substitute a different character map by having different definitions of the same character map in different stylesheet modules, and deciding which one to import using
Character mapping may impose a performance penalty, especially if a large number of characters are mapped.
Character mapping has no effect unless the result of the transformation is actually serialized. If the result tree is passed straight to another application that doesn't understand the special characters, it is unlikely to have the desired effect.
Character mapping only affects the content of text and attribute nodes. It doesn't affect characters in element and attribute names, or markup characters such as the quotes around an attribute value.
The character to be mapped, and all the characters in the replacement string, must be valid XML characters. This is because there is no way of representing invalid characters in the
Disable Output Escaping
XSLT 1.0 provided an alternative way of getting fine-grained control over the serializer, namely the
disable-output-escaping
attribute of the
Reasons to Disable Output Escaping
Normally, when you try to output a special character such as
<
or
&
in a text node, the special character will be escaped in the output file using the normal XML escaping mechanisms. The escaping is done by the serializer: the text node written in the result tree contains a
<
or
&
character, and the serializer translates this into
<
or
&
. The serializer is free to represent the special characters any way it wants; for example, it can write
<
as
<