Copy and Modified Documents with a Watson Explorer Converter

A common task when crawling and indexing a document in Watson Explorer Engine (WEX) is making changes to a document during the conversion process. The most common occurrence is needing to copy all the contents in the application-vxml document while making some changes to one or a few of those contents. To do this, there is a recursive copy template that can be used. I’ll show you how to apply it.

First, I’m going to use the out-of-box “example-metadata” collection. Navigate to that collection and click the test-it button.

wex collection screenshot test it

After clicking test-it you will see a listing of documents. Click on the test-it button for the “blowout” record.

watson explorer test-it results

On the resulting page, scroll down and look at the conversion trace. There is a converter called “Create Metadata from Content”. This is the converter that ships with WEX to convert the HTML files into v:xml documents. Each of the links on the left side represent input and output of that conversion step. We want to click on the output of this converter to see what the document looks like.

watson explorer conversion trace

You will see the output of your current V:XML document. Note that I have a Google Chrome plugin that is converting my XML output for display.

watson explorer converter output

For the sake of this exercise, let’s change the title field to contain the actual title and the author. Like this: Blowout – Lucy Spring. To do this we go back to the previous page and click “add new converter” further down the page.
watson explorer add converter

We want a custom converter

watson explorer add custom converter

Now you will see the configuration screen for a custom converter
wex_converter_08You want to set both the type-in and type-out to application/vxml-unnormalized as we want to apply this template to application/vxml-unnormalized and we will provide application/vxml-unnormalized as output. I use “unnormalized” because I want the normal WEX normalization functions to still apply after this transformation. Also give your converter a name.

wex custom converter configuration

The next section is the conditional setting. This is where you can determine the matches that will cause the converter to apply. In this case we want to match all so I just add a wildcard (*).

wex converter conditional settings

You can skip the advanced section and focus on the Action section. First, the needs to be set to XSL since we’re applying an XSL template to an XML document.

watson explorer custom converter action

Now we’ll use a standard template that allows you to copy nodes with special processing.

<!-- Match the root, recur -->
<xsl:template match="/">
  <xsl:apply-templates select="." mode="copy" />
</xsl:template>

<!-- Specialty nodes go here -->

<!-- End specialty nodes -->

<xsl:template match="@* | text() | comment()" mode="copy">
  <xsl:copy />
</xsl:template>

<!-- Default action, keep recurring and copying -->
<xsl:template match="*" mode="copy">
  <xsl:copy>
    <xsl:apply-templates select="@*" mode="copy" />
    <xsl:apply-templates mode="copy" />
  </xsl:copy>
</xsl:template>

The template above will only copy the document if you run it this way. We want to modify this to merge our title and author by matching on the title content and copying some things.

<!-- Match the root, recur -->
<xsl:template match="/">
  <xsl:apply-templates select="." mode="copy" />
</xsl:template>

<!-- Specialty nodes go here -->

<!-- match the title content -->
<xsl:template match="content[@name='title']" mode="copy">

  <!-- create a new content node -->
  <content>
    <!-- copy all the attributes -->
    <xsl:copy-of select="@*" />

    <!-- copy the value of the current node (title) and add a dash and the author content value-->

    <xsl:value-of select="concat(.,' - ',//content[@name='author'])" />


  </content>

</xsl:template>

<!-- End specialty nodes -->

<xsl:template match="@* | text() | comment()" mode="copy">
  <xsl:copy />
</xsl:template>

<!-- Default action, keep recurring and copying -->
<xsl:template match="*" mode="copy">
  <xsl:copy>
    <xsl:apply-templates select="@*" mode="copy" />
    <xsl:apply-templates mode="copy" />
  </xsl:copy>
</xsl:template>

As you can see I’ve added comments in the code above. The important thing to note is that I want to modify the title content so I match it and the mode is always copy due to the way this template works. Then I just copy the attributes, and concat the two values I wanted.

Save this converter and click test-it again at the top of the Watson Explorer page. You will now see your new converter in the conversion trace.

wex custom converter conversion trace

Now if we check the input and output we’ll see the difference.

The before:

wex before converter

Now the title after:

wex converter after

Now if you crawl this collection your titles will include the author name in the search results.
watson explorer search results

Published by

John Ward

I've been in working in the tech space since about 2004. I've spent time working with Artificial Intelligence, Machine Learning, Natural Language Processing, and Advertising technology.