0

I'm working on a project and I'm trying to get lxml to pull stock data from separate tables on separate web pages. When I run my program trying to print the values I'm trying to pull I get empty brackets

('Cash_and_short_term_investments:', []) ('EPSNextYear:', []) 

Here is a look at the way I am calling this:

 #the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement url = driver.current_url page2 = requests.get(url) tree2 = html.fromstring(page2.content) EPSNextYear = tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b') #Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b print ('EPSNextYear:', EPSNextYear) 

and:

#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print url = driver.current_url page3 = requests.get(url) tree3 = html.fromstring(page3.content) Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()') print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments) 

I have removed the tbody from the XPath like some similar questions have suggested. Any help or suggestions would be greatly appreciated, thanks!

1 Answer 1

1

When asking questions like this, you need to provide a short but complete example which demonstrates the problem.

Looking at your second example, it is clear that the XPath expression you are using is incorrect. You are missing the tbody element from your XPath. (And you might like to select the correct table row by looking for the actual string you are searching.)

Given the following code:

from lxml import etree import urllib url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA" parser = etree.HTMLParser() tree = etree.parse(urllib.urlopen(url), parser) result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]') for x in result: print etree.tostring(x) 

When running this like so:

> python test.py 

You get the following output:

<tr> <td class="lft lm">Cash and Short Term Investments </td> <td class="r">39.78</td> <td class="r">78.45</td> <td class="r">91.21</td> <td class="r">110.02</td> <td class="r rm">125.01</td> </tr> <tr> <td class="lft lm">Cash and Short Term Investments </td> <td class="r">110.02</td> <td class="r">161.49</td> <td class="r">184.49</td> <td class="r rm">140.49</td> </tr> 

I'm sure you will be able to figure out what is wrong with your first example, once you turned it into a self-contained reproducer of the problem.

Sign up to request clarification or add additional context in comments.

1 Comment

This is a good solution to get the strings, I then used Regex with regular expressions to isolate the numbers.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.