XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (172 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
5.57Mb size Format: txt, pdf, ePub

You could actually have used a boolean grouping key,
group-adjacent=“boolean (self::SPEAKER)”
, but that would be a little obscure for my taste.

All these examples so far would work equally well using
group-by
rather than
group-adjacent
, because there are no nonadjacent items that would have been put in the same group if you had used
group-by
. But it's still worth using
group-adjacent
, if only because it's likely to be more efficient—the system knows that it doesn't need to do any sorting or hashing, it just has to compare adjacent items.

Example: Handling Repeating Groups of Adjacent Elements

This example is a slightly more difficult variant of the previous example, in which the

elements have been omitted from the input markup.

Source

If the Shakespeare markup had been done by someone less capable than Jon Bosak, the

elements might have been left out. You would then see a structure like this:

PHRYNIA

TIMANDRA

More counsel with more money, bounteous Timon.

TIMON

More whore, more mischief first; I have given you earnest.

ALCIBIADES

Strike up the drum towards Athens! Farewell, Timon:

If I thrive well, I‘ll visit thee again.

I have modified the markup of this (very long) scene from
Timon of Athens
and included it as
timon-scene.xml
.

Output

The required output is the same as in the previous example: that is, a table, in which each row represents one speech, with the names of the speakers in one column and the lines spoken in the other.

Stylesheet

There are various ways of handling such a structure, none of them particularly easy. One approach is to do the grouping bottom-up: First, you put a group of consecutive speakers in a

element and a group of consecutive lines and stage directions in a

element, then you process the sequence of alternating

and

elements. Here's the logic, which is expanded into a full stylesheet in the download file
alternate-groups.xsl
:



  

    

                         group-adjacent=“if (self::SPEAKER)

                                         then ‘SPEAKERS’ else ‘LINES’”>

      

        

      

    

  

  

                      group-starting-with=“SPEAKERS”>

    


      

        

          

            

             

          

        

      

    


  



This does the grouping in two phases. The first phase creates a sequence of alternating elements named

and

, which you constructed by choosing these as your grouping keys. This sequence is held in a variable. The second phase uses
group-starting-with
to recognize a group consisting of a

element followed by a

element. All that remains is to process each group, which of course consists of a

element holding one or more

elements, followed by a

element holding one or more

and

elements.

If I had presented an example query “find all the speeches in Shakespeare involving two or more speakers and containing two or more lines,” and had presented the solution as
collection (‘shakes.xml’)//SPEECH[SPEAKER[2] and LINES[2]]
, you would probably have found the example rather implausible. But if you want to know how I found the Timon of Athens quote, you have your answer.

This stylesheet produced incorrect output when we tried it with AltovaXML 2008. It works correctly with Saxon and Gestalt.

Using group-starting-with

Like
group-adjacent
, the
group-starting-with
option selects groups of items that are adjacent in the population, and it therefore tends to be used with document-oriented XML. The difference is that with this option, there doesn't have to be any value that the adjacent nodes have in common: All that you need is a pattern that matches the first node in each group.

I could have used this technique for the previous Shakespeare example. In fact, given a scene consisting of alternating sequences of

elements and

elements, with no

elements to mark the boundaries, I could have reconstructed the

elements by writing:



  

       “SPEAKER[not(preceding-sibling::*[1] [self::SPEAKER])]”>

    

       

    

  



Here the pattern that marks out the first element in a new group is that it is a

element whose immediately preceding sibling element (if it has one) is not another

element.

A common use for
group-starting-with
is the implicit hierarchies one sees in XHTML. We will explore this in the next example.

Example: Handling Flat XHTML Documents

This example shows how to create a hierarchy to represent the underlying structure of an XHTML document in which headings and paragraphs are all represented as sibling elements.

Source

A typical XHTML document looks like this (
flat.xml
):


 

  Title

  

We need to understand how hierarchies can be flat.


  Subtitle

  

Let's get to the point.


  

The second paragraph in a section often says very little.


  

But the third gets to the heart of the matter.


  Subtitle

  

To conclude, we are dealing with a flat hierarchy.


 


This fragment consists of a

element with eight child elements, all at the same level of the tree. Very often, if you want to process this text, you will need to understand the hierarchic structure even though it is not explicit in the markup. For example, you may want to number the last paragraph as
1.2.1
.

Output

To manipulate this data, you need to transform it into a structure like the one below that reflects the true hierarchy:


  

Title

    

We need to understand how hierarchies can be flat.


    

Subtitle

      

Let's get to the point.


      

The second paragraph in a section often says very little.


      

But the third gets to the heart of the matter.


    


    

Subtitle

      

To conclude, we are dealing with a flat hierarchy.


    


  



Stylesheet

The
group-starting-with
option is ideal for this purpose, because the


and


elements are easy to match. Here is the code (
unflatten.xsl
):



  

    

  





  

                       group-starting-with=“h2”>

    

  





  

                       group-starting-with=“h3”>

    

  





  




  


I've shown this down to three levels; it should be obvious how it can be extended.

When an


element is matched, it is processed as part of a group that starts with an


element and then contains a number of


and


elements interleaved. The template rule first outputs the contents of the


element as a heading and then splits the contents of this group (excluding the first


element, which is of no further interest) into subgroups. The first subgroup will typically start with an ordinary


element, and all subsequent subgroups will start with an


element. Call

to process the first element in the subgroup, and this fires off either the
match = “p”
template (for the first group) or the
match = “h2”
template (for others). The
match = “p”
template simply copies the group of


elements to the result tree, while the
match = “h2”
template starts yet another level of grouping based on the


elements, and so on.

© FullEnglishBooks 2015 - 2024    Contact for me [email protected]