Get nodes that don't have specific ancestor xml xpath

Question

I'm struggling few days with quite complex xpath and I'm not able to formulate it. I have a syntactic tree from c++ like language parser and I would like to have xpath query, that selects all names, that are not in function name.

To be specific, I have xml document like this

(Whole xml document is on the end of the question, it is quite large I paste here a simple overview of the document structure) there are four node types
a - this element contains one node
b - contains information of the node (e.g. "CALL_EXPRESSION")
c - contains actual text (e.g. "printf", variable names...)
d - contains descendats of current node (a elements)

 CALL_EXPRESSION DOT_EXPRESSION NAME_EXPRESSION NAME NAME_EXPRESSION NAME PARAMS NAME_EXPRESSION NAME CALL_EXPRESSION NAME_EXPRESSION NAME PARAMS NAME_EXPRESSION NAME ASSIGNMENT_EXPRESSION NAME_EXPRESSION NAME NAME_EXPRESSION NAME

I would like to formulate Xpath query, that would select all NAMEs that are not descendats of CALL_EXPRESSION/*[1]. (This means i would like to select all variables and not the function names).

To select all the function names I can use Xpath like this

//a[b="CALL_EXPRESSION"]/d/a[1]

no problem here. Now, if I would like to select all nodes that are not descendats of this nodes. I would use not(ancestor::X).

But here goes the problem, if I formulate the Xpath expression like this:

//*[b="NAME"][not(ancestor::a[b="CALL_EXPRESSION"]/d/a[1])]

it selects only nodes, that don't have a that has child b="CALL_EXPRESSION" at all. In our example, it selects only NAME from the ASSIGNMENT_EXPRESSION subtree.

I suspected, that the problem is, that ancestor:: takes only the first element (in our case a[b="CALL_EXPRESSION"]) and restricts according to its predicate and further / are discarded. So i modified the xpath query like this:

//*[b="NAME"][not(ancestor::a[../../b="CALL_EXPRESSION" and position()=1])]

This seems to work only on the simpler CALL_EXPRESSION (without the DOT_EXPRESSION). I suspected, that the path in [] might be relative only to current node, not to the potential ancestors. But when I used the query

//*[b="NAME"][not(ancestor::a[b="CALL_EXPRESSION"])]

it worked as one would assume (all NAMEs what don't have ancestor CALL_EXPRESSION were selected).

Is there any way to formulate the query I need? And why don't the queries work?

Thanks in advance :)

The XML

<a> <b>CALL_EXPRESSION</b> <c>object.method(a)</c> <d> <a> <b>DOT_EXPRESSION</b> <c>object.method</c> <d> <a> <b>NAME_EXPRESSION</b> <c>object</c> <d> <a> <b>NAME</b> <c>object</c> <d> </d> </a> </d> </a> <a> <b>NAME_EXPRESSION</b> <c>method</c> <d> <a> <b>NAME</b> <c>method</c> <d> </d> </a> </d> </a> </d> </a> <a> <b>PARAMS</b> <c>(a)</c> <d> <a> <b>NAME_EXPRESSION</b> <c>a</c> <d> <a> <b>NAME</b> <c>a</c> <d> </d> </a> </d> </a> </d> </a> </d> </a> <a> <b>CALL_EXPRESSION</b> <c>puts(b)</c> <d> <a> <b>NAME_EXPRESSION</b> <c>puts</c> <d> <a> <b>NAME</b> <c>puts</c> <d> </d> </a> </d> </a> <a> <b>PARAMS</b> <c>(b)</c> <d> <a> <b>NAME_EXPRESSION</b> <c>b</c> <d> <a> <b>NAME</b> <c>b</c> <d> </d> </a> </d> </a> </d> </a> </d> </a> <a> <b>ASSIGNMENT_EXPRESSION</b> <c>c=d;</c> <d> <a> <b>NAME_EXPRESSION</b> <c>c</c> <d> <a> <b>NAME</b> <c>c</c> <d> </d> </a> </d> </a> <a> <b>NAME_EXPRESSION</b> <c>d</c> <d> <a> <b>NAME</b> <c>d</c> <d> </d> </a> </d> </a> </d> </a>

Oooh, I'm sorry, I didn't realize that the code would lose indentation and xml tags. I repaste all the code here. Here is the structure: pastebin.com/VbRBG5LA and here is the xml document: pastebin.com/ajPtqprf . If somebody could fix the question, I would be grateful. — tach
– tach, Commented May 16, 2011 at 1:31
Sorry, but it isn't clear what exactly you want to select. Please, provide the smallest possible XML document (it isn't necessary to be of the same kind, because your question seems general enough) with just a few levels and nodes and define which exactly nodes in it you want selected. Please, edit your question, or ask a new question with such simpler and more precise definition. — Dimitre Novatchev
– Dimitre Novatchev, Commented May 16, 2011 at 1:54
Good question, +1. See my answer for two XPath expressions that show how to select nodes that are not descendents of a given element in an XML document. — Dimitre Novatchev
– Dimitre Novatchev, Commented May 16, 2011 at 2:22
I restated the question with simpler example and none description around. Hope this helps stackoverflow.com/q/6012713/754982 — tach
– tach, Commented May 16, 2011 at 2:36

Michael Kay · Accepted Answer · 2011-05-16 09:06:49Z

You didn't say whether this is XPath 1.0 or 2.0. In XPath 2.0 you can use the except operator: for example

//* except //x//*

to select all elements that don't have x as an ancestor.

The except operator can also be simulated in XPath 1.0 using the equivalence

E1 except E2 ==> E1[count(.|E2)!=count(E2)]

(but taking care over the context for evaluation of E2).

Dimitre Novatchev · Accepted Answer · 2011-05-16 02:20:49Z

The question is not very clear and the XML provided isn't a wellformed XML document.

Anyway, here is my attempt to answer based on my understanding of this question text.

Let's have the following simple XML document:

<t> <x> <y> <z>Text 1</z> </y> </x> <x> <y> <z> Text 2</z> </y> </x> </t>

We want to select all z elements that are not descendents of /t/x[1]

Use either this XPath expression:

/t/z | /t/x[position() > 1]//z

or this one:

//z[not(ancestor::x [count(ancestor::*) = 1 and not(preceding-sibling::x) ] ) ]

I'd certainly recommend the first XPath expression as it is obviously much simpler, shorter and easier to understand.

It means: Select all z children of the top element t of the XML document and all z descendents of any x child of the top element t that is not the first such x child (whose position among all x children of t is not 1).

The second expression means: Select all z elements in the XML document that don't have as ancestor an element x that has only one element-ancestor (is a child of the top element) and has no preceding siblings named x (in other words that is the first x child of its parent).

Finally, here is a quick verification of the correctness of the two XPath expressions:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:copy-of select= "//z[not(ancestor::x [count(ancestor::*) = 1 and not(preceding-sibling::x) ] ) ] "/> ------------------- <xsl:copy-of select="/t/z | /t/x[position() > 1]//z"/> </xsl:template> </xsl:stylesheet>

When this transformation is applied on the simple XML document (shown above), we see that both expressions select exactly the wanted z element. The result of the transformation is:

<z> Text 2</z> ------------------- <z> Text 2</z>

I'm sorry I didn't express myself well. I need the node anywhere in the document. It doesn't have to be descendant of the <t> node like in this example. Actually I don't have information about the position of the <z> element, I only know that it must not be descendant of any node that matches //t/x[1]. I restated the question, I hope that I'm more clear there :). stackoverflow.com/q/6012713/754982
@tach: t/z was added for completeness. If you are sure that z can only occur under x, then you may omit the expression /t/z

Collectives™ on Stack Overflow

Get nodes that don't have specific ancestor xml xpath

2 Answers 2

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Linked

Related