3

How can I generate the required XPath expression to traverse from a given root node to a specified node down the xml structure?

I will receive HTML fragment of a table at runtime. I have to find the desired node based on some criteria and the form an XPath string from the table root node to that node and return that.

The HTML table structure is not known beforehand. Is there any API in Java that returns the XPath string given the root node and the child node?

1
  • Good question, +1. See my answer for a single XPath 2.0 expression that produces the wanted XPath expression. :) Commented Jan 5, 2011 at 14:31

4 Answers 4

1

I would recommend doing this in Groovy which provides GPATH (essentially an xpath implementation for the groovy language.) The Groovy syntax is very succint and powerful as described in my blog and mixes seamlessly with the he Java language (groovy is compiled down to java class files).

As for what you are trying to achieve...the following should traverse the entire HTML DOM structure and search for a "tag" (e.g. div) with a specific id attribute (e.g. unique_id_for_tag) with each entry found to be processed by the closure.

HTML.body.'**'.findAll { it.name() == 'tag' && it["@id"] == 'tag_name' }.each { //"it" is the return value if(it.td[0].text().toString().trim().contains('Hello')){ var x = it.td[0].text().toString().trim(); } 
Sign up to request clarification or add additional context in comments.

Comments

1

Below is one way (that I know) to achieve this

  1. Create a DOM of XML
  2. Get the Node of the specified node using the "//" XPATH
  3. Once you have the Node object from step 2 then it is just a matter of traversing up hierarchy using getParentNode() and building the xpath

Comments

1

This cannot be done (only) in pure XPath 1.0.

XPath 2.0 solution:

if(not($vStart intersect $vTarget/ancestor::*)) then () else for $vPath in string-join ((for $x in $vTarget /ancestor-or-self::*[. >> $vStart] /concat(name(.), for $n in name(.), $cn in count(../*[name(.) eq $n]) return if($cn ge 2) then concat('[', count((preceding-sibling::* [name() eq $n]) +1, ']') else (), '/' ) return $x), '' ) return string-join((concat(name($vStart), '/'),$vPath), '') 

When this XPath 2.0 expression is evaluated against the following XML document:

<table> <tr> <td><b>11</b></td> <td><i>12</i></td> </tr> <tr> <td><p><b>21</b></p></td> <td><p><b>221</b></p><p><b><i>222</i></b></p></td> </tr> <tr> <td><b>31</b></td> <td><i>32</i></td> </tr> </table> 

and if the two parameters are defined as:

 <xsl:variable name="vStart" select="/*"/> <xsl:variable name="vTarget" select="/*/tr[2]/td[2]/p[2]/b/i"/> 

then the result of the evaluation of the XPath 2.0 expression above is:

table/tr[2]/td[2]/p[2]/b/i/ 

2 Comments

+1 Good answer. I wouldn't make optional the positional predicate: think in a target without preceding but with followings
@Alejandro: Thanks, I fixed the expression and the result is still simplified when this is the only child with that name.
0

If you know the names of the root element and the child element you are trying to select, and if there is only one child element with that name, you could use simply "/root//child". But maybe I misunderstood what you were trying to achieve. Could you give an example ?

2 Comments

No it is not the only child. It may be a child, grand-child or many more levels down the hierarchy. The Search will be based on the node's content. Once the node is identified, I need to get the xpath expression to this node.
You could use something like "/root//*[contains(.,'test')]" to test the content, but if it returns more than one node it might be wrong to create an expression like "/root/a/b/c/child" with the first one, since "/root/d/e/child" might be a solution too. In this case the only correct XPath would be one using "//"...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.