Watson Explorer XSL Tips and Tricks

Using the Chico Application to Test XSLT

Here’s an example use of Chico! I was testing something to do with evaluating XSLT and wanted to run it quickly:
Get to chico by going to your Velocity’s ‘velocity’ script and adding v.app=chico. Enter AXL in the box on the left, then submit it to see the processed results on the right.

Match   within strings

Use to match non-breaking spaces within strings in XSL:


For loop with XSL

You can emulate a for loop in XSL by creating a string of some length and tokenizing it:

If you intend on using this “for-loop” in a converter and you are taking data from a web page then you must do two things to make it work.

  • Save the entire page in a xsl variable outside the loop. Once in the loop you will lose access to the webpage.
  • If you need to iterate in your xpaths then save position() in a variable. If you try to call it directly in your xpaths then they will not work correctly.


Get a random number within a specific range


Copy a nodeset with special processing
  • Use the following XSL to copy a nodeset verbatim. Enter any special processing templates in between the comments as indicated


Boost parser

  • Based on the above, this is a generic recipe for a parser you can add to a source where you want to boost its results. You’ll probably want to edit the values in uppercase. Note that we’re throwing away binning information and boost-onlying the results.
PROTIP:If you are adding this boost parser to a source which accesses a Velocity Search Engine collection, you **must** set the parser type toxsland **not**html-xslor you may spend hours debugging your parser. You’ll know if you made this mistake when the content nodes are empty.


Parse XML that uses an XML namespace

Define a new prefix with the xmlns attribute in each xsl:template and then prepend your new prefix to each XPath (in this case, I’ve set the namespace to the ‘a’ prefix).


Parse an XML file from the command line
  • Use transform (from your checkout/vivisimo/util directory):

You can set up an alias in your .bash_profile if you don’t want to type the whole path every time.

Or you can cd to your installation directory and run from there, then the software will find your vivisimo.conf.
  • Stub XSL for a starting point:

Using xsl:key

For nodesets that are frequently accessed, creating a hash lookup with xsl:key() can significantly improve performance (in my experience, accessing a value using a key is about 0.07ms or about as fast as accessing a variable value). Here’s a simple example of creating a key and using it on the /*/settings node in the display:
For a given input XML of:

You can create a key that compares against [email protected] of eachsettingnode:


Then you can access the value of that setting node with a call tokey(). The first parameter is the name of the key and the second parameter is the value to compare against [email protected]


Using a nodeset/result-tree as the scope and context[edit]

Normally in XSL when you execute a command, the global context for xpaths is the incoming XML and the scope is the current node.
The easiest way to change the scope is to use<apply-templates />; the new scope is whatever node that the template matches on. However, the context still remains the incoming XML.
To use a completely different context (and naturally the scope), you can use xsl:for-each with a constructed nodeset.
Example:

actually returns


This can be used with document()and AXL variables (which don’t have any context and will generate an error if you use a key() in them).

Note that running it in a exsl:node-set() context (or for that matter any context that you don’t really know about (i.e., anything other than / and document()) can generate weird results:

actually returns


Grouping – The Muenchian Method[edit]

Suppose you have the following xml:

To list each city grouped by state, use the following xsl:

  • The first line produces a key called location-key for each location using the value of state.
  • The outer for-each loop scans through all location nodes whose state node is the first in the group of locations with that state value. In other words, it loops one time for each state.
  • The inner for-each loop scans through all location nodes whose state node has the value of the current state node. In other words, it loops once for each city in the current state.
The above XSL will produce something like this:

The method described above is especially great for grouping large amounts of data as it is orders of magnitude faster than manually scanning through every node and comparing it with siblings.

One place that I found a particularly crazy use for the Muenchian Method was in a converter. I wanted to output the sum of numeric values in the 2nd column of an html table for groups of rows. A group of rows in the table was defined by all rows having the same values in their 6th, 3rd and 1st columns. In that case, I wanted to make a key for each row in the table using a value of the concatenation of the 6th, 3rd, and 1st columns in the row.

Next, I used a for-each loop to get one row from each group.


Inside that loop, I found the sum of all the numeric values in column 2 of each row in the current group.


If your head isn’t hurting yet, a more detailed explanation of the Muenchian Method is available here


How to Enqueue Javascript Links

This is a question that comes up a lot from customers; how do I enqueue links on web pages that use Javascript?
It’s actually a pretty simple process, but the way Javascript links are parsed by the browser won’t work when using a converter. Below is a sample XHTML file that has a few Javascript links in it that we’d like to convert and enqueue.

Below is the stylesheet we would use in the converter to extract the links for this HTML file.


This is a simple example, but it can be used as a good starting point for customers.


Directory, basename, and extension extraction

Sometimes it’s quite useful to extract the directory paths, the basename, and the extension from a given path. These three templates will do that.

 

Observe thatget-basenameandget-extare basically the same thing, the exception being the difference in the separator. These two templates could be combined in a singleafter-lasttemplate which takes as params the string and the separator.
Note that the behavior when an extension is not present is not defined. A better recursion termination condition would fix that.

Empty content remover

Database seeds sometimes generate empty content elements. This is bad form for the finished product. This customer XSL template will remove empty contents, i.e. contents without child nodes.

Parse HTML in XML

Sometimes people put HTML into RSS feeds and the like, and it needs to be parsed. This is not a straightforward task, but this code should help you out. This is taken from the parser for a custom RSS feed parser Colin Dean wrote for NIH. See VO #1024for a little discussion and a full source parser example.
You may be inclined to follow some online tutorials about using two passes to parse: one outputs the HTML with output escaping disabled, and the second pass actually performs the intended parsing. One may think to use Velocity’s secondary parser in a source to handle this. This is incorrect! It can be done in one pass with some magical viv XSL extensions.


Uglify Text for Content Name Attribute

First normalizes (trims) whitespace. Converts spaces to dashes. Lowercases, and strips non-alpha-numeric (and dash) characters.

 

Published by

John Ward

Hi, I'm John. By day I'm an IBM Watson Explorer Consultant with several years of experience deploying and customizing Watson Explorer solutions. I'm also a pretty experienced web developer and like to write tutorials and about other things like business and life experiences.