1

I need a transform that finds the index of a text pattern within a node. For instance in the XML below. If my text pattern for the <txt> node is "ain", the answer would be 6, 15, 26, and 41.

<root> <info find="ain"> <txt>The rain in Spain falls mainly in the plain.</txt> </info> </root> 

Transforms to...

<find> <txt>The rain in Spain falls mainly in the plain.</txt> <hit ndx="6"/> <hit ndx="15"/> <hit ndx="26"/> <hit ndx="41"/> </find> 
4
  • What do you want it to do with the indexes it finds? Commented Apr 25, 2013 at 19:25
  • @JLRishe I need each index to be an attribute for a separate element. Commented Apr 25, 2013 at 19:44
  • Could you post your desired output? Commented Apr 25, 2013 at 19:54
  • Added output per @JLRishe request. Commented Apr 25, 2013 at 20:22

3 Answers 3

3

EDIT: Here's an XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/> <xsl:strip-space elements="*" /> <xsl:template match="info[@find]"> <find> <xsl:copy-of select="txt[1]" /> <xsl:variable name="pattern" select="replace(@find, '[-/\\^$*+?.()|\[\]{}]', '\\$0')" /> <xsl:variable name="parts" select="tokenize(txt, $pattern)" /> <xsl:for-each select="1 to count($parts) - 1"> <xsl:variable name="soFar" select="string-join($parts[position() &lt;= current()], $pattern)" /> <hit ndx="{1 + string-length($soFar)}" /> </xsl:for-each> </find> </xsl:template> </xsl:stylesheet> 

And because I had already worked this up, here is an XSLT 1.0 approach.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/> <xsl:strip-space elements="*" /> <xsl:template match="info[@find]"> <find> <xsl:copy-of select="txt[1]" /> <xsl:call-template name="Matches"> <xsl:with-param name="text" select="txt[1]" /> <xsl:with-param name="pattern" select="@find" /> </xsl:call-template> </find> </xsl:template> <xsl:template name="Matches"> <xsl:param name="text" /> <xsl:param name="pattern" /> <xsl:param name="offset" select="1" /> <xsl:variable name="found" select="substring-before($text, $pattern)" /> <xsl:if test="$found"> <hit ndx="{$offset + string-length($found)}" /> <xsl:call-template name="Matches"> <xsl:with-param name="text" select="substring-after($text, $pattern)" /> <xsl:with-param name="pattern" select="$pattern" /> <xsl:with-param name="offset" select="$offset + string-length($found) + string-length($pattern)" /> </xsl:call-template> </xsl:if> </xsl:template> </xsl:stylesheet> 

When either is run on your sample input, the result is:

<find> <txt>The rain in Spain falls mainly in the plain.</txt> <hit ndx="6" /> <hit ndx="15" /> <hit ndx="26" /> <hit ndx="41" /> </find> 
Sign up to request clarification or add additional context in comments.

2 Comments

You probably should do <xsl:variable name="pattern" select="replace(@find, '[-/\\^$*+?.()|\[\]{}]', '\\$0')" />
@Tomalak Thanks, good point. @OP, if you want to treat @find as a regex, you can revert that variable to just <xsl:variable name="pattern" select="@find" />
3

This XSLT 2.0 transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/*/info"> <xsl:variable name="vSeq" select="string-to-codepoints(txt)"/> <xsl:variable name="vPatSeq" select="string-to-codepoints(@find)"/> <xsl:sequence select= "for $vPat in string(@find), $vPatLength in string-length(@find) return index-of($vSeq, $vPatSeq[1]) [$vPat eq codepoints-to-string(subsequence($vSeq, ., $vPatLength))] "/> </xsl:template> </xsl:stylesheet> 

when applied on the provided XML document:

<root> <info find="ain"> <txt>The rain in Spain falls mainly in the plain.</txt> </info> </root> 

produces the correct result:

 6 15 26 41 

Here is the equally short transformation that uses this to produce the wanted XML result:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="/*/info"> <xsl:variable name="vSeq" select="string-to-codepoints(txt)"/> <xsl:variable name="vPatSeq" select="string-to-codepoints(@find)"/> <find> <xsl:copy-of select="txt"/> <xsl:for-each select= "for $vPat in string(@find), $vPatLength in string-length(@find) return index-of($vSeq, $vPatSeq[1]) [$vPat eq codepoints-to-string(subsequence($vSeq, ., $vPatLength))]"> <hit ndx="{.}"/> </xsl:for-each> </find> </xsl:template> </xsl:stylesheet> 

When this transformation is applied on the same provided XML document (above), the wanted result is produced:

<find> <txt>The rain in Spain falls mainly in the plain.</txt> <hit ndx="6"/> <hit ndx="15"/> <hit ndx="26"/> <hit ndx="41"/> </find> 

Alternatively, one can use:

<xsl:for-each select= "(1 to string-length(txt) -string-length($vPat) +1) [starts-with(substring($vTxt, .), $vPat)] "> 

And the complete transformation is:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="/*/info"> <xsl:variable name="vTxt" select="txt"/> <xsl:variable name="vPat" select="string(@find)"/> <find> <xsl:copy-of select="txt"/> <xsl:for-each select= "(1 to string-length(txt) -string-length($vPat) +1) [starts-with(substring($vTxt, .), $vPat)] "> <hit ndx="{.}"/> </xsl:for-each> </find> </xsl:template> </xsl:stylesheet> 

Do note the simplicity and directness of this solution:

  • No recursion.

  • No named templates.

  • No xsl:function s.

  • No xsl:param s.

  • No xsl:if .

  • No additional namespace declarations.

  • No substring-after() .

  • No RegExes.

  • No replace() .

  • No tokenize() .

  • No regex-group()` s.

  • No string-join() .

  • No count() .

Comments

1

Use recursion. If you are using XSLT2, then it would be easiest create a function:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:s="http://string-functions" version="2.0"> <xsl:function name="s:indexes" as="element(find)"> <xsl:param name="str1"/> <xsl:param name="str2"/> <find value="{$str2}"> <txt><xsl:value-of select="$str1"/></txt> <xsl:sequence select="s:indexes($str1, $str2, 0)"/> </find> </xsl:function> <xsl:function name="s:indexes" as="element(hit)*"> <xsl:param name="str1"/> <xsl:param name="str2"/> <xsl:param name="offset"/> <xsl:variable name="sub-before" select="substring-before($str1, $str2)"/> <xsl:if test="$sub-before ne ''"> <xsl:variable name="position" select="$offset + string-length($sub-before) + 1"/> <xsl:variable name="rest" select="substring(substring-after($str1, $sub-before), string-length($str2))"/> <xsl:variable name="new-offset" select="$offset + string-length($str1) - string-length($rest)"/> <hit test="{$position}"/> <xsl:sequence select="s:indexes($rest, $str2, $new-offset)"/> </xsl:if> </xsl:function> <xsl:template match="*"> <xsl:sequence select="s:indexes('The rain in Spain falls mainly in the plain', 'ain')"/> </xsl:template> </xsl:stylesheet> 

=>

<find value="ain"> <txt>The rain in Spain falls mainly in the plain</txt> <hit test="6"/> <hit test="15"/> <hit test="26"/> <hit test="41"/> </find> 

3 Comments

@dacracot I translated the answer to XSLT2. That should work.
@Tomalak I liked the question, but I don't like writing XSLT1 solutions because they tend to be really verbose.
Updated to match @dacracot's requested output (with a nice little overloaded method).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.