Regular Expression Converter for Watson Explorer Engine

Sometimes it’s useful to extract data from a Watson Explorer content node using regular expressions. In this post, I’ll show you how to extract data using a regular expression and create a new content node for that specific data.

To start off we will use the default example-metadata collection. We will attempt to extract any 3 digit number from the snippet content to make the regex easy. You can do much more advanced regular expressions if necessary.

First go to the example-metadata collection and click “test-it”

Then click on “Test-it” next to the first result:

Now scroll down and look at the output of the ” Create Metadata from Content” converter:

In the output, you will see the snippet content has the number 500 in it.


We will make a converter that will extract any 3 digit number into a new content. First, add a new converter:

Select the Regex entity extraction converter and click Add.

In the converter configuration, in the list of entities node names enter “my-regex-node” and the target node of “snippet”. Then click OK.

Now on the sidebar of WEX click the + next to XML.

enter the following names:

Now update the xml node to include your regular expression like below. Note that my regex is “[0-9]{3}” to match 3 digits. Save the node.

<entities name="my-regex-node">
  <entity name="regex-rule" weight="-1">
    <regex>[0-9]{3}</regex>
    <replace>viv:str-to-mixed(viv:current-string())</replace>
  </entity>
</entities>

Return to the collection and do a test-it, as we did above, down to that same first result. If you look at the converter trace you will see the regex converter is running.

Click on the 910 output to see your new content node:

Now you can use the new “regex-rule” content in your search application.

Published by

John Ward

I've been in working in the tech space since about 2004. I've spent time working with Artificial Intelligence, Machine Learning, Natural Language Processing, and Advertising technology.