Xpath Not Returning Values lxml Python

Question

I'm working on a project and I'm trying to get lxml to pull stock data from separate tables on separate web pages. When I run my program trying to print the values I'm trying to pull I get empty brackets

('Cash_and_short_term_investments:', []) ('EPSNextYear:', [])

Here is a look at the way I am calling this:

 #the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement url = driver.current_url page2 = requests.get(url) tree2 = html.fromstring(page2.content) EPSNextYear = tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b') #Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b print ('EPSNextYear:', EPSNextYear)

and:

#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print url = driver.current_url page3 = requests.get(url) tree3 = html.fromstring(page3.content) Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()') print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments)

I have removed the tbody from the XPath like some similar questions have suggested. Any help or suggestions would be greatly appreciated, thanks!

Markus · Accepted Answer · 2016-11-08 08:04:19Z

When asking questions like this, you need to provide a short but complete example which demonstrates the problem.

Looking at your second example, it is clear that the XPath expression you are using is incorrect. You are missing the tbody element from your XPath. (And you might like to select the correct table row by looking for the actual string you are searching.)

Given the following code:

from lxml import etree import urllib url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA" parser = etree.HTMLParser() tree = etree.parse(urllib.urlopen(url), parser) result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]') for x in result: print etree.tostring(x)

When running this like so:

> python test.py

You get the following output:

<tr> <td class="lft lm">Cash and Short Term Investments </td> <td class="r">39.78</td> <td class="r">78.45</td> <td class="r">91.21</td> <td class="r">110.02</td> <td class="r rm">125.01</td> </tr> <tr> <td class="lft lm">Cash and Short Term Investments </td> <td class="r">110.02</td> <td class="r">161.49</td> <td class="r">184.49</td> <td class="r rm">140.49</td> </tr>

I'm sure you will be able to figure out what is wrong with your first example, once you turned it into a self-contained reproducer of the problem.

This is a good solution to get the strings, I then used Regex with regular expressions to isolate the numbers.

Collectives™ on Stack Overflow

Xpath Not Returning Values lxml Python

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related