Watson Explorer XSL Tips and Tricks

Using the Chico Application to Test XSLT

Here’s an example use of Chico! I was testing something to do with evaluating XSLT and wanted to run it quickly:
<vce>
 <process-xsl><![CDATA[
    <xsl:template match="/">
      <xsl:variable name="params">
        <param name="jean" value="bar" />
      </xsl:variable>
      <foo2>
        [<xsl:value-of select="exsl:node-set($params)//param[@name = 'john']/@value" />]
      </foo2>
    </xsl:template>
  ]]></process-xsl>
</vce>
Get to chico by going to your Velocity’s ‘velocity’ script and adding v.app=chico. Enter AXL in the box on the left, then submit it to see the processed results on the right.

Match &nbsp; within strings

Use&#160;to match non-breaking spaces within strings in XSL:
<xsl:if test="contains($text, '&#160')"/>

For loop with XSL

You can emulate a for loop in XSL by creating a string of some length and tokenizing it:
<xsl:for-each select="str:tokenize(str:padding(5000), '')">
 <xsl:value-of select="position()" />
</xsl:for-each>

If you intend on using this “for-loop” in a converter and you are taking data from a web page then you must do two things to make it work.

  • Save the entire page in a xsl variable outside the loop. Once in the loop you will lose access to the webpage.
  • If you need to iterate in your xpaths then save position() in a variable. If you try to call it directly in your xpaths then they will not work correctly.
<xsl:variable name="page" select="."/>
<xsl:for-each select="str:tokenize(str:padding(5000), '')">
 <xsl:variable name="i" select="position()" />
<content name="something">
<xsl:value-of select="$page//p[@class="data-1"][$i]"/>
</content>
</xsl:for-each>

Get a random number within a specific range

<xsl:variable name="num" select="count($nodes)" />
<xsl:variable name="rand" select="floor(math:random() * $num) + 1" />

Copy a nodeset with special processing
  • Use the following XSL to copy a nodeset verbatim. Enter any special processing templates in between the comments as indicated
<!-- Match the root, recur -->
<xsl:template match="/">
 <xsl:apply-templates select="." mode="copy" />
</xsl:template>


<!-- Specialty nodes go here -->


<!-- End specialty nodes -->


<xsl:template match="@* | text() | comment()" mode="copy">
 <xsl:copy />
</xsl:template>


<!-- Default action, keep recurring and copying -->
<xsl:template match="*" mode="copy">
 <xsl:copy>
   <xsl:apply-templates select="@*" mode="copy" />
   <xsl:apply-templates mode="copy" />
 </xsl:copy>
</xsl:template>

Boost parser

  • Based on the above, this is a generic recipe for a parser you can add to a source where you want to boost its results. You’ll probably want to edit the values in uppercase. Note that we’re throwing away binning information and boost-onlying the results.
PROTIP:If you are adding this boost parser to a source which accesses a Velocity Search Engine collection, you **must** set the parser type toxsland **not**html-xslor you may spend hours debugging your parser. You’ll know if you made this mistake when the content nodes are empty.
<xsl:template match="scope">
 <xsl:copy>
   <xsl:apply-templates select="@*" mode="copy" />
   <boost name="BOOSTNAME" display-name="DISPLAYNAME" />
   <xsl:apply-templates select="* | text() | comment()"
     mode="copy"
    />
 </xsl:copy>
</xsl:template>


<xsl:template match="document" mode="copy">
 <xsl:copy>
   <xsl:apply-templates select="@*" mode="copy" />
   <xsl:attribute name="boost-name">BOOSTNAME</xsl:attribute>
   <xsl:attribute name="boost-display">boost-only</xsl:attribute>
   <xsl:apply-templates select="* | text() | comment()" mode="copy" />
 </xsl:copy>
</xsl:template>


<xsl:template match="binning-set" mode="copy" />


<xsl:template match="@* | text() | comment()" mode="copy">
 <xsl:copy />
</xsl:template>


<xsl:template match="*" mode="copy">
 <xsl:copy>
   <xsl:apply-templates select="@* | * | text() | comment()" mode="copy" />
 </xsl:copy>
</xsl:template>

Parse XML that uses an XML namespace

Define a new prefix with the xmlns attribute in each xsl:template and then prepend your new prefix to each XPath (in this case, I’ve set the namespace to the ‘a’ prefix).
<xsl:template match="/" xmlns:a="urn:yahoo:srch">
 <scope>
   <xsl:variable name="total" select="a:ResultSet/@totalResultsAvailable" />
   <attribute name="total-results" value="{viv:if-else($total, $total, 0)}" />
   <xsl:apply-templates select="a:ResultSet/a:Result" />
 </scope>
</xsl:template>


<xsl:template match="a:Result" xmlns:a="urn:yahoo:srch">
 <document url="{a:ClickUrl}" key="{viv:url-key(a:Url)}" display-url="{a:Url}">
   <xsl:variable name="cache" select="a:Cache/a:Url" />
   <xsl:if test="$cache">
     <xsl:attribute name="cache">
       <xsl:value-of select="$cache" />
     </xsl:attribute>
   </xsl:if>
   <content name="title" weight="3" output-action="bold">
     <xsl:value-of select="a:Title" />
   </content>
   <content name="snippet" output-action="summarize">
     <xsl:value-of select="a:Summary" />
   </content>
 </document>
</xsl:template>

Parse an XML file from the command line
  • Use transform (from your checkout/vivisimo/util directory):
$ [checkout]/vivisimo/util/transform -xsl get-sources.xsl repository.xml
on apps1:
$ /usr/local/vivisimo-[yourname]/bin/converters/transform -xsl get-sources.xsl repository.xml

You can set up an alias in your .bash_profile if you don’t want to type the whole path every time.

Or you can cd to your installation directory and run from there, then the software will find your vivisimo.conf.
  • Stub XSL for a starting point:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
 xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
 xmlns:math='http://exslt.org/math'
 xmlns:str='http://exslt.org/strings'
 xmlns:dyn='http://exslt.org/dynamic'
 xmlns:exsl='http://exslt.org/common'
 xmlns:set='http://exslt.org/sets'
 xmlns:date='http://exslt.org/dates-and-times'
 xmlns:func='http://exslt.org/functions'
 xmlns:lib='http://xmlsoft.org/XSLT/namespace'
 xmlns:viv='http://vivisimo.com/exslt'
 xmlns:disp='http://vivisimo.com/disp'
 extension-element-prefixes='math str dyn exsl set date func lib viv disp'
>
 <xsl:output method="xml" indent="yes"/>


 <xsl:template match="/">
 </xsl:template>
</xsl:stylesheet>

Using xsl:key

For nodesets that are frequently accessed, creating a hash lookup with xsl:key() can significantly improve performance (in my experience, accessing a value using a key is about 0.07ms or about as fast as accessing a variable value). Here’s a simple example of creating a key and using it on the /*/settings node in the display:
For a given input XML of:
<vce>
 <settings>
   <setting name="application-name-text" section="Simple" tab="Theme" type="text">Vivísimo Velocity</setting>
   <setting name="front-logo-image" section="Simple" tab="Theme" type="image">viv-logo-velocity.gif</setting>
 </settings>
</vce>

You can create a key that compares against the@nameattribute of eachsettingnode:

<xsl:key name="settings" match="/*/settings/setting" use="@name" />

Then you can access the value of that setting node with a call tokey(). The first parameter is the name of the key and the second parameter is the value to compare against the@useXPath.

<xsl:value-of select="key('settings', 'application-name-text')" />

Using a nodeset/result-tree as the scope and context[edit]

Normally in XSL when you execute a command, the global context for xpaths is the incoming XML and the scope is the current node.
The easiest way to change the scope is to use<apply-templates />; the new scope is whatever node that the template matches on. However, the context still remains the incoming XML.
To use a completely different context (and naturally the scope), you can use xsl:for-each with a constructed nodeset.
Example:
<process-xsl>
<xml-to-text>
<xsl:variablename="a">
<a>
<bname="a"/>
<bname="b"/>
</a>
</xsl:variable>
<xsl:keyname="k"match="b"use="@name"/>
<xsl:templatematch="/">
<xsl:variablename="gc"select="."/>
GLOBAL CONTEXT:<xsl:copy-ofselect="key('k','a')"/>
LOCAL CONTEXT:
<xsl:for-eachselect="exsl:node-set($a)">
<xsl:copy-ofselect="key('k','a')"/>
</xsl:for-each>
RETURN TO GLOBAL CONTEXT:
<xsl:for-eachselect="exsl:node-set($a)">
<xsl:for-eachselect="$gc">
<xsl:copy-ofselect="key('k','a')"/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xml-to-text>
</process-xsl>

actually returns

GLOBAL CONTEXT:
LOCAL CONTEXT:<bname="a"/>
RETURN TO GLOBAL CONTEXT:

This can be used with document()and AXL variables (which don’t have any context and will generate an error if you use a key() in them).

Note that running it in a exsl:node-set() context (or for that matter any context that you don’t really know about (i.e., anything other than / and document()) can generate weird results:
<process-xsl>
 <xml-to-text>
 <xsl:variable name="a">
   <a>
     <b name="a">aa</b>
     <b name="b">ab</b>
   </a>
 </xsl:variable>


 <xsl:variable name="b">
   <a>
     <b name="a">ba</b>
     <b name="b">bb</b>
   </a>
 </xsl:variable>




 <xsl:key name="k" match="b" use="@name"/>


 <xsl:template match="/">
   <xsl:variable name="gc" select="."/>
    GLOBAL CONTEXT: <xsl:copy-of select="key('k','a')"/>
    LOCAL CONTEXT:
     <xsl:for-each select="exsl:node-set($a)">
       <xsl:copy-of select="key('k','a')"/>
     </xsl:for-each>
    RETURN TO GLOBAL CONTEXT:
     <xsl:for-each select="exsl:node-set($a)">
       <xsl:for-each select="$gc">
         <xsl:copy-of select="key('k','a')"/>
       </xsl:for-each>
     </xsl:for-each>
 </xsl:template>
 </xml-to-text>
</process-xsl>

actually returns

GLOBAL CONTEXT:
LOCAL CONTEXT:<bname="a">aa</b><bname="a">ba</b>
RETURN TO GLOBAL CONTEXT:

Grouping – The Muenchian Method

Suppose you have the following xml:
<locationlist>
 <location>
   <state>
      Pennsylvania
   </state>
   <city>
      Pittsburgh
   </city>
 </location>
 <location>
   <state>
      Pennsylvania
   </state>
   <city>
      Philadelphia
   </city>
 </location>
 <location>
   <state>
      Ohio
   </state>
   <city>
      Cleveland
   </city>
 </location>
</locationlist>

To list each city grouped by state, use the following xsl:

<xsl:key name="location-key" match="location" use="state" />


<xsl:template match="locationlist">
 <xsl:for-each select="location[count(. | key('location-key', state)[1]) = 1]">
   <xsl:sort select="state" />
   <xsl:value-of select="state" />:
   <ul>
   <xsl:for-each select="key('location-key', state)">
     <xsl:sort select="city" />
     <li><xsl:value-of select="city" /></li>
   </xsl:for-each>
   </ul>
 </xsl:for-each>
</xsl:template>
  • The first line produces a key called location-key for each location using the value of state.
  • The outer for-each loop scans through all location nodes whose state node is the first in the group of locations with that state value. In other words, it loops one time for each state.
  • The inner for-each loop scans through all location nodes whose state node has the value of the current state node. In other words, it loops once for each city in the current state.
The above XSL will produce something like this:
Ohio:
<ul>
<li>Cleveland</li>
</ul>
Pennsylvania:
<ul>
<li>Philadelphia</li>
<li>Pittsburgh</li>
</ul>

The method described above is especially great for grouping large amounts of data as it is orders of magnitude faster than manually scanning through every node and comparing it with siblings.

One place that I found a particularly crazy use for the Muenchian Method was in a converter. I wanted to output the sum of numeric values in the 2nd column of an html table for groups of rows. A group of rows in the table was defined by all rows having the same values in their 6th, 3rd and 1st columns. In that case, I wanted to make a key for each row in the table using a value of the concatenation of the 6th, 3rd, and 1st columns in the row.
<xsl:key name="termkey" match="TR" use="concat(TD[6],TD[3],TD[1])" />

Next, I used a for-each loop to get one row from each group.

<xsl:for-each select="TR[count(. | key('termkey', concat(TD[6],TD[3],TD[1]))[1]) = 1]">

Inside that loop, I found the sum of all the numeric values in column 2 of each row in the current group.

<xsl:value-of select="sum(key('termkey', concat(TD[6], TD[3], TD[1]))/TD[2])" />

If your head isn’t hurting yet, a more detailed explanation of the Muenchian Method is available here


How to Enqueue Javascript Links

This is a question that comes up a lot from customers; how do I enqueue links on web pages that use Javascript?
It’s actually a pretty simple process, but the way Javascript links are parsed by the browser won’t work when using a converter. Below is a sample XHTML file that has a few Javascript links in it that we’d like to convert and enqueue.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
 <head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
   <title>Javascript Link Extractor Test Page</title>
   <script type="text/javascript">
      function aLink(url) {
        document.location.replace('http://www.vivisimo.com' + url);
      }
   </script>
 </head>
 <body>
   <h1>Header1</h1>
   <p>This is a <strong><em>paragraph</em></strong>.</p>
   <p>Some links to <a href="javascript:aLink('/html/careers');">careers</a> and <a href="javascript:aLink('/html/support');">support</a></p>
 </body>
</html>

Below is the stylesheet we would use in the converter to extract the links for this HTML file.

<xsl:variable name="apos">'</xsl:variable>
<xsl:template match="/">
 <!-- Apply the aLink-extractor template for all links that contain 'javascript:aLink' in their @href. -->
 <xsl:apply-templates match="//a[@href[contains(., 'javascript:aLink')]]" mode="aLink-extractor" />
</xsl:template>


<xsl:template match="a" mode="aLink-extractor">
 <!-- Build the complete URL from what is in the link and is prepended using Javascript. -->
 <xsl:variable name="url">
   <xsl:text>http://vivisimo.com</xsl:text>
   <!-- Grab everything between the two apostrophes ('), this is our URL.  -->
   <xsl:value-of select="substring-before(substring-after(@href, $apos), $apos)" />
 </xsl:variable>
 <xsl:value-of select="viv:crawl-enqueue-url($url)" />
</xsl:template>

This is a simple example, but it can be used as a good starting point for customers.


Directory, basename, and extension extraction

Sometimes it’s quite useful to extract the directory paths, the basename, and the extension from a given path. These three templates will do that.
<!-- the templates -->
<xsl:templatename="get-directory">
<xsl:paramname="uri"/>
<xsl:iftest="contains($uri, '/')">
<xsl:value-ofselect="concat(substring-before($uri, '/'), '/')"/>
<xsl:call-templatename="get-directory">
<xsl:with-paramname="uri"select="substring-after($uri, '/')"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:templatename="get-basename">
<xsl:paramname="uri"/>
<xsl:choose>
<xsl:whentest="contains($uri, '/')">
<xsl:call-templatename="get-basename">
<xsl:with-paramname="uri"select="substring-after($uri, '/')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-ofselect="$uri"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:templatename="get-ext">
<xsl:paramname="uri"/>
<xsl:choose>
<xsl:whentest="contains($uri, '.')">
<xsl:call-templatename="get-ext">
<xsl:with-paramname="uri"select="substring-after($uri, '.')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-ofselect="$uri"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- USAGE showing how to get the extension -->
<xsl:variablename="pathpath"select="viv:url-decompose($the-url)"/>
<xsl:variablename="basename">
<xsl:call-templatename="get-basename">
<xsl:with-paramname="uri"select="$pathpath/path"/>
</xsl:call-template>
</xsl:variable>
<xsl:variablename="ext">
<xsl:call-templatename="get-ext">
<xsl:with-paramname="uri"select="$basename"/>
</xsl:call-template>
</xsl:variable>
Observe thatget-basenameandget-extare basically the same thing, the exception being the difference in the separator. These two templates could be combined in a singleafter-lasttemplate which takes as params the string and the separator.
Note that the behavior when an extension is not present is not defined. A better recursion termination condition would fix that.

Empty content remover

Database seeds sometimes generate empty content elements. This is bad form for the finished product. This customer XSL template will remove empty contents, i.e. contents without child nodes.
<!-- Match the root, recur -->
<xsl:templatematch="/">
<xsl:apply-templatesselect="."mode="copy"/>
</xsl:template>
<!-- Specialty nodes go here -->
<xsl:templatematch="content"mode="copy">
<xsl:choose>
<xsl:whentest="count(./text()) >0">
<xsl:copy>
<xsl:apply-templatesselect="@*"mode="copy"/>
<xsl:apply-templatesselect="* | text() | comment()"mode="copy"/>
</xsl:copy>
</xsl:when>
</xsl:choose>
</xsl:template>
<!-- End specialty nodes -->
<xsl:templatematch="@* | text() | comment()"mode="copy">
<xsl:copy />
</xsl:template>
<!-- Default action, keep recurring and copying -->
<xsl:templatematch="*"mode="copy">
<xsl:copy>
<xsl:apply-templatesselect="@*"mode="copy"/>
<xsl:apply-templatesselect="* | text() | comment()"mode="copy"/>
</xsl:copy>
</xsl:template>

Parse HTML in XML

Sometimes people put HTML into RSS feeds and the like, and it needs to be parsed. This is not a straightforward task, but this code should help you out. This is taken from the parser for a custom RSS feed parser Colin Dean wrote for NIH. See VO #1024for a little discussion and a full source parser example.
You may be inclined to follow some online tutorials about using two passes to parse: one outputs the HTML with output escaping disabled, and the second pass actually performs the intended parsing. One may think to use Velocity’s secondary parser in a source to handle this. This is incorrect! It can be done in one pass with some magical viv XSL extensions.
<xsl:template match="/">
 <vce>
   <xsl:apply-templates select="/rss/channel/item" />
 </vce>
</xsl:template>
<xsl:template match="*">
 <document url="{link}" key="{viv:url-key(link)}">
   <content name="title" type="html" action="cluster" weight="3.000000">
     <xsl:value-of select="title" />
   </content>


   <xsl:apply-templates select="description" />


   <content name="who-may-apply" action="none">
     <xsl:text>All Applicants</xsl:text>
   </content>
   <content name="source" action="none">
     <xsl:value-of select="'USAJOBS U.S. Citizens'" />
   </content>
   <content name="date" type="html" action="none" weight="0.000000">
     <xsl:value-of select="pubDate" />
   </content>
 </document>
</xsl:template>
<xsl:template match="*" mode="stn">
 <vce>
   <xsl:copy-of select="." />
 </vce>
</xsl:template>
<xsl:template match="description">


 <xsl:variable name="desc-string">
   <xsl:value-of select="concat('&lt;description>',text(),'&lt;/description>')" />
 </xsl:variable>


 <xsl:variable name="desc-node">
   <xsl:variable name="stn">
     <xsl:apply-templates select="viv:string-to-node($desc-string, true())"
       mode="stn"
      />
   </xsl:variable>
   <xsl:copy-of select="exsl:node-set($stn)" />
 </xsl:variable>


 <!-- debugging -->
 <!--
<content name="desc-text">
    <xsl:value-of select="text()" />
  </content>
<content name="desc-string">
    <xsl:value-of select="$desc-string" />
  </content>
  <content name="desc-node-count">
    <xsl:value-of select="count($desc-node)" />
  </content>


  <content name="desc-node-string">
    <xsl:value-of select="viv:node-to-string($desc-node, true())" />
  </content>
-->


 <xsl:variable name="nobr">
   <xsl:apply-templates select="exsl:node-set($desc-node)//description"
     mode="br-remover"
    />
 </xsl:variable>
 <xsl:for-each select="str:tokenize($nobr, '&#9;')">
   <xsl:variable name="toks" select="str:tokenize(., '¤')" />
   <xsl:variable name="name">
     <xsl:call-template name="clean-content-name">
       <xsl:with-param name="text" select="$toks[position()=1]" />
     </xsl:call-template>
   </xsl:variable>
   <content name="{$name}" position="{position()}">
     <xsl:value-of select="$toks[position()=2]" />
   </content>
 </xsl:for-each>
</xsl:template>


<xsl:template match="description" mode="br-remover">
 <xsl:copy>
   <xsl:apply-templates select="font|text()" mode="br-remover" />
 </xsl:copy>
</xsl:template>


<xsl:template match="font" mode="br-remover">
 <xsl:value-of select="concat(., '¤')" />
</xsl:template>


<xsl:template match="text()" mode="br-remover">
 <xsl:if test="string-length(normalize-space(.)) > 0">
   <xsl:value-of select="concat(., '&#9;')" />
 </xsl:if>
</xsl:template>


<xsl:template name="clean-content-name" xmlns:str="http://exslt.org/strings">
 <xsl:param name="text" />
 <xsl:variable name="removed-colon" select="str:replace($text, ':','')" />
 <xsl:variable name="removed-left-paren"
   select="str:replace($removed-colon, '(', '')"
  />
 <xsl:variable name="removed-right-paren"
   select="str:replace($removed-left-paren, ')', '')"
  />
 <xsl:variable name="normalized-space"
   select="normalize-space($removed-right-paren)"
  />
 <xsl:variable name="converted-space-to-dash"
   select="str:replace($normalized-space, ' ', '-')"
  />
 <xsl:value-of select="viv:str-to-lower($converted-space-to-dash)" />
</xsl:template>

Uglify Text for Content Name Attribute

First normalizes (trims) whitespace. Converts spaces to dashes. Lowercases, and strips non-alpha-numeric (and dash) characters.
<xsl:template name="format-content-name">
 <xsl:param name="unformatted-name" select="'unspecified-content'" />
 <xsl:value-of
   select="viv:replace(viv:str-to-lower(str:replace(normalize-space($unformatted-name),' ','-')),'[^a-z0-9\-]','','gi')" />
</xsl:template>

Published by

John Ward

I've been in working in the tech space since about 2004. I've spent time working with Artificial Intelligence, Machine Learning, Natural Language Processing, and Advertising technology.