I'm trying to use xpath to extract HTML5 microdata from a page. I'm essentially trying to say "find nested nodes with an itemprop=name attribute that are not nested inside another itemscope element (at any depth)". Given the following example I'm trying to find the name of the product (shoes) but I don't want the brand name (Nike).
<div itemscope itemtype="http://schema.org/Product> <div itemscope itemtype="http://schema.org/Brand"> <div itemprop="name">Nike</div> <!-- don't want this --> </div> <div itemprop="name">shoes</div> <!-- do want this --> </div> I can easily find the itemprop=name element by using something like //*[@itemprop=name] but this would also pull in the brand name. Btw the elements shown in the example may be nested inside other tags so I can't simple say "whose immediate parent does not have an itemscope attribute" I believe there may be something relating to ancestors that I can use but I don't know enough about xpath. Any ideas?
shoesis inside anitemscope, so to clarify, you want the names that have at most oneitemscopeancestor, but not those that have more than one?itemscopeelement X, extract all the names that are inside X but not also inside any otheritemscope?