3

I am trying to pull an exact table during a "web scrape." Used cURL to pull page into $html, which succeeds fine.

Used Firebug to get exact XPATH to the table needed.

Code follows:

$dom = new DOMDocument($html); $dom->loadHTML($html); $xpath = new DOMXpath($dom); $summary = $xpath->evaluate('/html/body/table[5]/tbody/tr/td[3]/table/tbody/tr[8]/td/table'); echo "Summary Length: " . $summary->length; 

When executed, $summary->length is always zero. It doesn't pull that table node.

Any ideas?

1

2 Answers 2

4

Firefox is liable to insert "virtual" tbody elements into tables that don't have them; do those elements exist in the original file?

Sign up to request clarification or add additional context in comments.

4 Comments

No, they don't. But I do see them in firefox. I have used XPath Checker as well and can see the data I need. But using it in my PHP xpath->evaluate never returns data.
<tr> is not allowed inside <table> directly - there has to be a <tbody> / <thead> / <tfoot>. It's implied if not specified directly. HTML is weird like that... the start and end tags can both be optional!
If the the tbody elements don't exist in the original file, then they shouldn't be in your PHP xpath query.
I apologize. The TBODY tags are there. I overlooked them when first looking at the source.
2

Just remove "/tbody". From xpath you got from firefox:

.//*[@id='data']/tbody/tr[1]/td[2]/span

create this:

.//*[@id='data']/tr[1]/td[2]/span

Aloe

Comments