Using the Chico Application to Test XSLT
<vce> <process-xsl><![CDATA[ <xsl:template match="/"> <xsl:variable name="params"> <param name="jean" value="bar" /> </xsl:variable> <foo2> [<xsl:value-of select="exsl:node-set($params)//param[@name = 'john']/@value" />] </foo2> </xsl:template> ]]></process-xsl> </vce>
Match within strings
<xsl:if test="contains($text, ' ')"/>
For loop with XSL
<xsl:for-each select="str:tokenize(str:padding(5000), '')"> <xsl:value-of select="position()" /> </xsl:for-each>
If you intend on using this “for-loop” in a converter and you are taking data from a web page then you must do two things to make it work.
-
Save the entire page in a xsl variable outside the loop. Once in the loop you will lose access to the webpage.
-
If you need to iterate in your xpaths then save position() in a variable. If you try to call it directly in your xpaths then they will not work correctly.
<xsl:variable name="page" select="."/> <xsl:for-each select="str:tokenize(str:padding(5000), '')"> <xsl:variable name="i" select="position()" /> <content name="something"> <xsl:value-of select="$page//p[@class="data-1"][$i]"/> </content> </xsl:for-each>
Get a random number within a specific range
<xsl:variable name="num" select="count($nodes)" /> <xsl:variable name="rand" select="floor(math:random() * $num) + 1" />
-
Use the following XSL to copy a nodeset verbatim. Enter any special processing templates in between the comments as indicated
<!-- Match the root, recur --> <xsl:template match="/"> <xsl:apply-templates select="." mode="copy" /> </xsl:template> <!-- Specialty nodes go here --> <!-- End specialty nodes --> <xsl:template match="@* | text() | comment()" mode="copy"> <xsl:copy /> </xsl:template> <!-- Default action, keep recurring and copying --> <xsl:template match="*" mode="copy"> <xsl:copy> <xsl:apply-templates select="@*" mode="copy" /> <xsl:apply-templates mode="copy" /> </xsl:copy> </xsl:template>
Boost parser
-
Based on the above, this is a generic recipe for a parser you can add to a source where you want to boost its results. You’ll probably want to edit the values in uppercase. Note that we’re throwing away binning information and boost-onlying the results.
<xsl:template match="scope"> <xsl:copy> <xsl:apply-templates select="@*" mode="copy" /> <boost name="BOOSTNAME" display-name="DISPLAYNAME" /> <xsl:apply-templates select="* | text() | comment()" mode="copy" /> </xsl:copy> </xsl:template> <xsl:template match="document" mode="copy"> <xsl:copy> <xsl:apply-templates select="@*" mode="copy" /> <xsl:attribute name="boost-name">BOOSTNAME</xsl:attribute> <xsl:attribute name="boost-display">boost-only</xsl:attribute> <xsl:apply-templates select="* | text() | comment()" mode="copy" /> </xsl:copy> </xsl:template> <xsl:template match="binning-set" mode="copy" /> <xsl:template match="@* | text() | comment()" mode="copy"> <xsl:copy /> </xsl:template> <xsl:template match="*" mode="copy"> <xsl:copy> <xsl:apply-templates select="@* | * | text() | comment()" mode="copy" /> </xsl:copy> </xsl:template>
Parse XML that uses an XML namespace
<xsl:template match="/" xmlns:a="urn:yahoo:srch"> <scope> <xsl:variable name="total" select="a:ResultSet/@totalResultsAvailable" /> <attribute name="total-results" value="{viv:if-else($total, $total, 0)}" /> <xsl:apply-templates select="a:ResultSet/a:Result" /> </scope> </xsl:template> <xsl:template match="a:Result" xmlns:a="urn:yahoo:srch"> <document url="{a:ClickUrl}" key="{viv:url-key(a:Url)}" display-url="{a:Url}"> <xsl:variable name="cache" select="a:Cache/a:Url" /> <xsl:if test="$cache"> <xsl:attribute name="cache"> <xsl:value-of select="$cache" /> </xsl:attribute> </xsl:if> <content name="title" weight="3" output-action="bold"> <xsl:value-of select="a:Title" /> </content> <content name="snippet" output-action="summarize"> <xsl:value-of select="a:Summary" /> </content> </document> </xsl:template>
-
Use transform (from your checkout/vivisimo/util directory):
$ [checkout]/vivisimo/util/transform -xsl get-sources.xsl repository.xml on apps1: $ /usr/local/vivisimo-[yourname]/bin/converters/transform -xsl get-sources.xsl repository.xml
You can set up an alias in your .bash_profile if you don’t want to type the whole path every time.
-
Stub XSL for a starting point:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:math='http://exslt.org/math' xmlns:str='http://exslt.org/strings' xmlns:dyn='http://exslt.org/dynamic' xmlns:exsl='http://exslt.org/common' xmlns:set='http://exslt.org/sets' xmlns:date='http://exslt.org/dates-and-times' xmlns:func='http://exslt.org/functions' xmlns:lib='http://xmlsoft.org/XSLT/namespace' xmlns:viv='http://vivisimo.com/exslt' xmlns:disp='http://vivisimo.com/disp' extension-element-prefixes='math str dyn exsl set date func lib viv disp' > <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> </xsl:template> </xsl:stylesheet>
Using xsl:key
<vce> <settings> <setting name="application-name-text" section="Simple" tab="Theme" type="text">Vivísimo Velocity</setting> <setting name="front-logo-image" section="Simple" tab="Theme" type="image">viv-logo-velocity.gif</setting> </settings> </vce>
You can create a key that compares against the@nameattribute of eachsettingnode:
<xsl:key name="settings" match="/*/settings/setting" use="@name" />
Then you can access the value of that setting node with a call tokey(). The first parameter is the name of the key and the second parameter is the value to compare against the@useXPath.
<xsl:value-of select="key('settings', 'application-name-text')" />
Using a nodeset/result-tree as the scope and context[edit]
<process-xsl> <xml-to-text> <xsl:variablename="a"> <a> <bname="a"/> <bname="b"/> </a> </xsl:variable> <xsl:keyname="k"match="b"use="@name"/> <xsl:templatematch="/"> <xsl:variablename="gc"select="."/> GLOBAL CONTEXT:<xsl:copy-ofselect="key('k','a')"/> LOCAL CONTEXT: <xsl:for-eachselect="exsl:node-set($a)"> <xsl:copy-ofselect="key('k','a')"/> </xsl:for-each> RETURN TO GLOBAL CONTEXT: <xsl:for-eachselect="exsl:node-set($a)"> <xsl:for-eachselect="$gc"> <xsl:copy-ofselect="key('k','a')"/> </xsl:for-each> </xsl:for-each> </xsl:template> </xml-to-text> </process-xsl>
actually returns
GLOBAL CONTEXT: LOCAL CONTEXT:<bname="a"/> RETURN TO GLOBAL CONTEXT:
This can be used with document()
and AXL variables (which don’t have any context and will generate an error if you use a key() in them).
document()
) can generate weird results:<process-xsl> <xml-to-text> <xsl:variable name="a"> <a> <b name="a">aa</b> <b name="b">ab</b> </a> </xsl:variable> <xsl:variable name="b"> <a> <b name="a">ba</b> <b name="b">bb</b> </a> </xsl:variable> <xsl:key name="k" match="b" use="@name"/> <xsl:template match="/"> <xsl:variable name="gc" select="."/> GLOBAL CONTEXT: <xsl:copy-of select="key('k','a')"/> LOCAL CONTEXT: <xsl:for-each select="exsl:node-set($a)"> <xsl:copy-of select="key('k','a')"/> </xsl:for-each> RETURN TO GLOBAL CONTEXT: <xsl:for-each select="exsl:node-set($a)"> <xsl:for-each select="$gc"> <xsl:copy-of select="key('k','a')"/> </xsl:for-each> </xsl:for-each> </xsl:template> </xml-to-text> </process-xsl>
actually returns
GLOBAL CONTEXT: LOCAL CONTEXT:<bname="a">aa</b><bname="a">ba</b> RETURN TO GLOBAL CONTEXT:
Grouping – The Muenchian Method
<locationlist> <location> <state> Pennsylvania </state> <city> Pittsburgh </city> </location> <location> <state> Pennsylvania </state> <city> Philadelphia </city> </location> <location> <state> Ohio </state> <city> Cleveland </city> </location> </locationlist>
To list each city grouped by state, use the following xsl:
<xsl:key name="location-key" match="location" use="state" /> <xsl:template match="locationlist"> <xsl:for-each select="location[count(. | key('location-key', state)[1]) = 1]"> <xsl:sort select="state" /> <xsl:value-of select="state" />: <ul> <xsl:for-each select="key('location-key', state)"> <xsl:sort select="city" /> <li><xsl:value-of select="city" /></li> </xsl:for-each> </ul> </xsl:for-each> </xsl:template>
-
The first line produces a key called location-key for each location using the value of state.
-
The outer for-each loop scans through all location nodes whose state node is the first in the group of locations with that state value. In other words, it loops one time for each state.
-
The inner for-each loop scans through all location nodes whose state node has the value of the current state node. In other words, it loops once for each city in the current state.
Ohio: <ul> <li>Cleveland</li> </ul> Pennsylvania: <ul> <li>Philadelphia</li> <li>Pittsburgh</li> </ul>
The method described above is especially great for grouping large amounts of data as it is orders of magnitude faster than manually scanning through every node and comparing it with siblings.
<xsl:key name="termkey" match="TR" use="concat(TD[6],TD[3],TD[1])" />
Next, I used a for-each loop to get one row from each group.
<xsl:for-each select="TR[count(. | key('termkey', concat(TD[6],TD[3],TD[1]))[1]) = 1]">
Inside that loop, I found the sum of all the numeric values in column 2 of each row in the current group.
<xsl:value-of select="sum(key('termkey', concat(TD[6], TD[3], TD[1]))/TD[2])" />
If your head isn’t hurting yet, a more detailed explanation of the Muenchian Method is available here
How to Enqueue Javascript Links
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>Javascript Link Extractor Test Page</title> <script type="text/javascript"> function aLink(url) { document.location.replace('http://www.vivisimo.com' + url); } </script> </head> <body> <h1>Header1</h1> <p>This is a <strong><em>paragraph</em></strong>.</p> <p>Some links to <a href="javascript:aLink('/html/careers');">careers</a> and <a href="javascript:aLink('/html/support');">support</a></p> </body> </html>
Below is the stylesheet we would use in the converter to extract the links for this HTML file.
<xsl:variable name="apos">'</xsl:variable> <xsl:template match="/"> <!-- Apply the aLink-extractor template for all links that contain 'javascript:aLink' in their @href. --> <xsl:apply-templates match="//a[@href[contains(., 'javascript:aLink')]]" mode="aLink-extractor" /> </xsl:template> <xsl:template match="a" mode="aLink-extractor"> <!-- Build the complete URL from what is in the link and is prepended using Javascript. --> <xsl:variable name="url"> <xsl:text>http://vivisimo.com</xsl:text> <!-- Grab everything between the two apostrophes ('), this is our URL. --> <xsl:value-of select="substring-before(substring-after(@href, $apos), $apos)" /> </xsl:variable> <xsl:value-of select="viv:crawl-enqueue-url($url)" /> </xsl:template>
This is a simple example, but it can be used as a good starting point for customers.
Directory, basename, and extension extraction
<!-- the templates --> <xsl:templatename="get-directory"> <xsl:paramname="uri"/> <xsl:iftest="contains($uri, '/')"> <xsl:value-ofselect="concat(substring-before($uri, '/'), '/')"/> <xsl:call-templatename="get-directory"> <xsl:with-paramname="uri"select="substring-after($uri, '/')"/> </xsl:call-template> </xsl:if> </xsl:template> <xsl:templatename="get-basename"> <xsl:paramname="uri"/> <xsl:choose> <xsl:whentest="contains($uri, '/')"> <xsl:call-templatename="get-basename"> <xsl:with-paramname="uri"select="substring-after($uri, '/')"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-ofselect="$uri"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:templatename="get-ext"> <xsl:paramname="uri"/> <xsl:choose> <xsl:whentest="contains($uri, '.')"> <xsl:call-templatename="get-ext"> <xsl:with-paramname="uri"select="substring-after($uri, '.')"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-ofselect="$uri"/> </xsl:otherwise> </xsl:choose> </xsl:template> <!-- USAGE showing how to get the extension --> <xsl:variablename="pathpath"select="viv:url-decompose($the-url)"/> <xsl:variablename="basename"> <xsl:call-templatename="get-basename"> <xsl:with-paramname="uri"select="$pathpath/path"/> </xsl:call-template> </xsl:variable> <xsl:variablename="ext"> <xsl:call-templatename="get-ext"> <xsl:with-paramname="uri"select="$basename"/> </xsl:call-template> </xsl:variable>
Empty content remover
<!-- Match the root, recur --> <xsl:templatematch="/"> <xsl:apply-templatesselect="."mode="copy"/> </xsl:template> <!-- Specialty nodes go here --> <xsl:templatematch="content"mode="copy"> <xsl:choose> <xsl:whentest="count(./text()) >0"> <xsl:copy> <xsl:apply-templatesselect="@*"mode="copy"/> <xsl:apply-templatesselect="* | text() | comment()"mode="copy"/> </xsl:copy> </xsl:when> </xsl:choose> </xsl:template> <!-- End specialty nodes --> <xsl:templatematch="@* | text() | comment()"mode="copy"> <xsl:copy /> </xsl:template> <!-- Default action, keep recurring and copying --> <xsl:templatematch="*"mode="copy"> <xsl:copy> <xsl:apply-templatesselect="@*"mode="copy"/> <xsl:apply-templatesselect="* | text() | comment()"mode="copy"/> </xsl:copy> </xsl:template>
Parse HTML in XML
<xsl:template match="/"> <vce> <xsl:apply-templates select="/rss/channel/item" /> </vce> </xsl:template> <xsl:template match="*"> <document url="{link}" key="{viv:url-key(link)}"> <content name="title" type="html" action="cluster" weight="3.000000"> <xsl:value-of select="title" /> </content> <xsl:apply-templates select="description" /> <content name="who-may-apply" action="none"> <xsl:text>All Applicants</xsl:text> </content> <content name="source" action="none"> <xsl:value-of select="'USAJOBS U.S. Citizens'" /> </content> <content name="date" type="html" action="none" weight="0.000000"> <xsl:value-of select="pubDate" /> </content> </document> </xsl:template> <xsl:template match="*" mode="stn"> <vce> <xsl:copy-of select="." /> </vce> </xsl:template> <xsl:template match="description"> <xsl:variable name="desc-string"> <xsl:value-of select="concat('<description>',text(),'</description>')" /> </xsl:variable> <xsl:variable name="desc-node"> <xsl:variable name="stn"> <xsl:apply-templates select="viv:string-to-node($desc-string, true())" mode="stn" /> </xsl:variable> <xsl:copy-of select="exsl:node-set($stn)" /> </xsl:variable> <!-- debugging --> <!-- <content name="desc-text"> <xsl:value-of select="text()" /> </content> <content name="desc-string"> <xsl:value-of select="$desc-string" /> </content> <content name="desc-node-count"> <xsl:value-of select="count($desc-node)" /> </content> <content name="desc-node-string"> <xsl:value-of select="viv:node-to-string($desc-node, true())" /> </content> --> <xsl:variable name="nobr"> <xsl:apply-templates select="exsl:node-set($desc-node)//description" mode="br-remover" /> </xsl:variable> <xsl:for-each select="str:tokenize($nobr, '	')"> <xsl:variable name="toks" select="str:tokenize(., '¤')" /> <xsl:variable name="name"> <xsl:call-template name="clean-content-name"> <xsl:with-param name="text" select="$toks[position()=1]" /> </xsl:call-template> </xsl:variable> <content name="{$name}" position="{position()}"> <xsl:value-of select="$toks[position()=2]" /> </content> </xsl:for-each> </xsl:template> <xsl:template match="description" mode="br-remover"> <xsl:copy> <xsl:apply-templates select="font|text()" mode="br-remover" /> </xsl:copy> </xsl:template> <xsl:template match="font" mode="br-remover"> <xsl:value-of select="concat(., '¤')" /> </xsl:template> <xsl:template match="text()" mode="br-remover"> <xsl:if test="string-length(normalize-space(.)) > 0"> <xsl:value-of select="concat(., '	')" /> </xsl:if> </xsl:template> <xsl:template name="clean-content-name" xmlns:str="http://exslt.org/strings"> <xsl:param name="text" /> <xsl:variable name="removed-colon" select="str:replace($text, ':','')" /> <xsl:variable name="removed-left-paren" select="str:replace($removed-colon, '(', '')" /> <xsl:variable name="removed-right-paren" select="str:replace($removed-left-paren, ')', '')" /> <xsl:variable name="normalized-space" select="normalize-space($removed-right-paren)" /> <xsl:variable name="converted-space-to-dash" select="str:replace($normalized-space, ' ', '-')" /> <xsl:value-of select="viv:str-to-lower($converted-space-to-dash)" /> </xsl:template>
Uglify Text for Content Name Attribute
<xsl:template name="format-content-name"> <xsl:param name="unformatted-name" select="'unspecified-content'" /> <xsl:value-of select="viv:replace(viv:str-to-lower(str:replace(normalize-space($unformatted-name),' ','-')),'[^a-z0-9\-]','','gi')" /> </xsl:template>