3

I have an XML file with following tree structure.

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"> <channel> <title>Videos</title> <link>https://www.example.com/r/videos/</link> <description>A long description of the video.</description> <image>...</image> <atom:link rel="self" href="http://www.example.com/videos/.xml" type="application/rss+xml"/> <item> <title>The most used Jazz lick in history.</title> <link> http://www.example.com/ </link> <guid isPermaLink="true"> http://www.example.com/ </guid> <pubDate>Mon, 07 Sep 2015 14:43:34 +0000</pubDate> <description> <table> <tr> <td> <a href="http://www.example.com/"> <img src="http://www.example.com/.jpg" alt="The most used Jazz lick in history." title="The most used Jazz lick in history." /> </a> </td> <td> submitted by <a href="http://www.example.com/"> jcepiano </a> <br/> <a href="http://www.youtube.com/">[link]</a> <a href="http://www.example.com/"> [508 comments] </a> </td> </tr> </table> </description> <media:title>The most used Jazz lick in history.</media:title> <media:thumbnail url="http://example.jpg"/> </item> </channel> </rss> 

Here, the html table element is embedded inside XML and that's confusing me.

Now I want to pick the text node values for //channel/item/title and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

I am using PHP DOMDocument();

I have been looking for a perfect solution for this for 2 days now, can you please let me know how would this happen?

Also I need to count the total number of items in the feed, right now I am doing like this:

... $queryResult = $xpathvar->query('//item/title'); $total = 1; foreach($queryResult as $result){ $total++; } echo $title; 

And I also need a reference link for XPath query selectors' rules.

Thanks in advance! :)

2
  • By using `` (backticks) around inline code elements like tagnames, it becomes a bit easier to read and dissect the code from the English text. I have updated your question. Commented Sep 7, 2015 at 21:04
  • 1
    Thanks for your help and information, I'll keep in mind from next time onwards. Commented Sep 7, 2015 at 21:07

2 Answers 2

1

You wrote that you wanted the length of the result set of the following query:

$queryResult = $xpathvar->query('//item/title'); 

I assume that $xpathvar here is of type DOMXPath. If so, it has a length property as described here. Instead of using foreach, simply use:

$length = $xpathvar->query('//item/title')->length; 

Now I want to pick the text node values for //channel/item/title

Which you can get with the expression //channel/item/title/text().

and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

Your expression here selects any tr, the first td under that, then the first a. But the first a does not have a value of "[link]" in your source. If you want that, though, you can use:

//channel/item/description/table/tr/td[1]/a[1]/@href 

but it looks like you rather want:

//channel/item/description/table/tr/td/a[. = "[link]"][1]/@href 

which finds the first a element in the tree that has the value (text node) that is "[link]".

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

Not sure if this was a separate question or meant to explain the previous one. Regardless, the answer the same as in the previous one, unless you explicitly want to search for 2nd a etc (i.e., search by position), in which case you can use numeric predicates.


Note: you start most of your expressions with //expr, which essentially means: search the whole tree at any depth for the expression expr. This is potentially expensive and if all you need is a (relative) root node for which you know the starting point or expression, it is better, and far more performant, to use a direct path. In your case, you can replace //channel for /*/channel (because it is the first under the root element).

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the answer, I learnt a lot of things here. However, I am still struggling with few things. The items element repeats many times in the XML feed. And I need to echo each text node value based on above mentioned Xpath expressions. How can I do this?
@Yogie, sounds like you should use for-each in PHP (like you already showed), or you should use the concat XPath function. Typically, tasks like yours are done in XSLT, in which case it becomes simpler, esp. if you need to add HTML markup as well. If you can't figure out the PHP part, consider asking a new question about only that issue.
Thanks for taking time and effort to answer, sorry I am new to XML parsing, so please bear with me. Because I need to echo the node values for two elements (as mentioned in the question), rather than one, so I think it's better to use for loop instead of foreach, because foreach will echo only one node values, in our case, either title or href for "[link]".
@Yogie, you can access properties of the nodes that are in the node set, see the documentation for DOMNodeList, which also has a couple of examples. To get two elements as result from the XPath, either use nodename1 | nodename2, which selects both elements individually, or select the parent, so that you can access the children.
0

I finally could make it work with the code below

$url = "https://www.example.com/r/videos/.xml"; $feed_dom = new domDocument; $feed_dom->load($url); $feed_dom->preserveWhiteSpace = false; $items = $feed_dom->getElementsByTagName('item'); foreach($items as $item){ $title = $item->getElementsByTagName('title')->item(0)->nodeValue; $desc_table = $item->getElementsByTagName('description')->item(0)->nodeValue; echo $title . "<br>"; $table_dom = new domDocument; $table_dom->loadHTML($desc_table); $xpath = new DOMXpath($table_dom); $table_dom->preserveWhiteSpace = false; $yt_link_node = $xpath->query("//table/tr/td[2]/a[2]"); foreach($yt_link_node as $yt_link){ $yt = $yt_link->getAttribute('href'); echo $yt . "<br>"; echo "<br>"; } } 

I thank Abel, your help was greatly useful to achieve the tasks. :)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.