Getting value from tag with BeautifulSoup

Question

I'm trying to scrape movie information from the info box on Wikipedia using BeautifulSoup. I'm having trouble scraping movie budgets, as below.

For example, I want to scrape the '$25 million' budget value from the info box. How can I get the budget value, given that the neither the th nor td tags are unique? (See example HTML).

Say I have tag = soup.find('th') with the value <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th> - How can I get the value of '$25 million' from tag?

I thought I could do something like tag.td or tag.text but neither of these are working for me.

Do I have to loop over all tags and check if their text is equal to 'Budget', and if so get the following cell?

Example HTML Code:

<tr> <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th> <td style="line-height:1.3em;">$25 million<sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td> </tr> <tr> <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Box office</th> <td style="line-height:1.3em;">$65.7 million<sup id="cite_ref-BOM_3-0" class="reference"><a href="#cite_note-BOM-3">[3]</a></sup></td> </tr>

akuiper · Accepted Answer · 2017-03-08 03:19:34Z

You can firstly find the node with tag td whose text is Budget and then find its next sibling td and get the text from the node:

soup.find("th", text="Budget").find_next_sibling("td").get_text() # u'$25 million[2]'

Bijoy · Accepted Answer · 2017-03-08 03:10:40Z

0

To get every Amount in <td> tags You should use

tags = soup.findAll('td')

and then

for tag in tags: print tag.get_text() # To get the text i.e. '$25 million'

answered Mar 8, 2017 at 3:10

Bijoy

1,1311 gold badge12 silver badges23 bronze badges

4 Comments

user7019687 Over a year ago

Will this not just print the values of every <td> tag?

Bijoy Over a year ago

Yea It will print the value, If you want you can do whatever you want to do with it

user7019687 Over a year ago

But I'm specifically looking for the value after the tag which contains the word 'Budget' as the tag text, not every <td>value.

Bijoy Over a year ago

Yea for that you can simply make a comparison in findAll <tr> and then get only value of <td> if the text of <th> is equal to Budget.

Wenlong Liu · Accepted Answer · 2017-03-08 03:18:33Z

What you need is find_all() method in BeatifulSoup.

For example:

 tdTags = soup.find_all('td',{'class':'reference'})

This means you will find all 'td' tags when class = 'reference'.

You can find whatever td tags you want as long as you find the unique attribute in expected td tags.

Then you can do a for loop to find the content, as @Bijoy said.

niraj · Accepted Answer · 2017-03-08 03:44:56Z

The other possible way might be:

split_text = soup.get_text().split('\n') # The next index from Budget is cost split_text[split_text.index('Budget')+1]

Collectives™ on Stack Overflow

Getting value from tag with BeautifulSoup

4 Answers 4

Comments

4 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

Comments

Comments

Related