0

I have this html table:

<tbody> <tr>..</tr> <tr> <td class="tbl_black_n_1">1</td> <td class="tbl_black_n_1" nowrap="" align="center">23/07/14 08:10</td> <td class="tbl_black_n_1"> <img src="http://www.betonews.com/img/SportId389.gif" width="10" height="10" border="0" alt=""> </td> <td class="tbl_black_n_1"></td> <td class="tbl_black_n_1" nowrap="" align="center">BAK WS</td> <td class="tbl_black_n_1" nowrap="" align="right">M. Eguchi</td> <td class="tbl_black_n_1" align="center">-</td> <td class="tbl_black_n_1" nowrap="">Radwanska U. </td> <td class="tbl_black_n_1" align="center" title=" ">1,02</td> <td class="tbl_black_n_1" align="center"> <td class="tbl_black_n_1" align="center" title=" "> </td> <td class="tbl_black_n_1" align="center"> <td class="tbl_black_n_1" align="center" title=" ">55,00</td> <td class="tbl_black_n_1" align="center"> <td class="tbl_black_n_1" align="right">86%</td> <td class="tbl_black_n_1" align="right">-</td> <td class="tbl_black_n_1" align="right">14%</td> <td class="tbl_black_n_1" align="center" title=" ">524.647</td> <td class="tbl_black_n_1" nowrap=""> <a href="popup.asp?tp=2100&amp;lang=en&amp;idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_betfair.gif" width="12" height="10" border="0" alt=""></a> <a href="popup.asp?tp=2110&amp;lang=en&amp;idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_history.gif" width="12" height="10" border="0" alt=""></a> </td> </tr> <tr>..</tr> <tr>..</tr> <tr>..</tr> ... </tbody> 

There are more than one hundred <tr> structured at the same way, which contain lots of <td>. How can I loop with xpath to store all data in a database? I don't want to get the first <tr>: the query has to begin with the second <tr> (that I have showed).

This is my php code, but I can not go on.. help!

<?php $url = 'http://www.betonews.com/table.asp?tp=2001&lang=en&dd=23&dm=7&dy=2014&df=1&dw=3'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); $document = new DOMDocument(); $document->loadHTML($response); $xpath = new DOMXPath($document); $expression = '/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr'; $rows = $xpath->query($expression); $results = array(); foreach ($rows as $row) { $result = array(); ??? } 

This is what I want to be the final result:

[0] => Array ( [date] => 23/07/14 08:10 [image] => http://www.betonews.com/img/SportId389.gif [team1] => M. Eguchi [team2] => Radwanska U. [1] => 1,02 [x] => 0 [2] => 55,00 [1%] => 86% [x%] => 0 [2%] => 14% [total] => 524.647 ) 

1 Answer 1

1

I would use a different XPath to select the table. First, there is always a problem using absolute paths with tables like this, because often tbody elements are just added by the browser, but they are not actually present in the document, i.e. not visible to the PHP code. Also, because if anything in the source HTML changes in terms of styleing, your code breaks. Now I select the first table with a cellpadding of 3 - This is not optimal, but there wasn't any obvious unique identifier.

Apart from that, you can simply iterate over the DOMNodeList result and then get the correct child nodes. Notice, that the items are increased by two, because whitespace-only elements in between are also a node in XML.

$xpath = new DOMXPath($document); $expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]'; $rows = $xpath->query($expression); $results = array(); foreach ($rows as $row) { $result = array(); $td = $row->childNodes; $result["date"] = $td->item(2)->nodeValue; $result["image"] = $td->item(4)->firstChild->attributes->getNamedItem("src")->nodeValue; $result["team1"] = $td->item(10)->nodeValue; $result["team2"] = $td->item(12)->nodeValue; $result["1"] = $td->item(14)->nodeValue; $result["x"] = $td->item(16)->nodeValue; $result["2"] = $td->item(18)->nodeValue; $result["1%"] = $td->item(20)->nodeValue; $result["x%"] = $td->item(22)->nodeValue; $result["2%"] = $td->item(24)->nodeValue; $result["total"] = $td->item(26)->nodeValue; $results[] = $result; } 

For the image, you have to do same more proccesing, because you do not want the actual text, but the src attribute of the <img> element instead.

Sign up to request clarification or add additional context in comments.

7 Comments

Another simple question.. how can i retrieve the href attribute of the last <tr> group? It is composed by two "a" tag..
I don't know which element you are referring to. Are you sure you mean a <tr/> element, not <td/>? And if there are two <a/> elements, from which one do you want the href attribute?
Oh yes! I mean the last <td> element.. i want the href attribute of the first <a>. So, in my example: "popup.asp?tp=2100&amp;lang=en&amp;idm=553759".
I understand, I am simply not here all the time for all requests. Please be at least a little bit patient. Also, you should be able to figure this one out by yourself, as it follows exactly the same logic applied above. However, this should work: $td->item(36)->childNodes->item(1)->attributes->getNamedItem("href")->nodeValue;
thanks, i don't understand why you use "36th" item.. your query returns "NULL"
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.