I'm trying to scrape movie information from the info box on Wikipedia using BeautifulSoup. I'm having trouble scraping movie budgets, as below.
For example, I want to scrape the '$25 million' budget value from the info box. How can I get the budget value, given that the neither the th nor td tags are unique? (See example HTML).
Say I have tag = soup.find('th') with the value <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th> - How can I get the value of '$25 million' from tag?
I thought I could do something like tag.td or tag.text but neither of these are working for me.
Do I have to loop over all tags and check if their text is equal to 'Budget', and if so get the following cell?
Example HTML Code:
<tr> <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th> <td style="line-height:1.3em;">$25 million<sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td> </tr> <tr> <th scope="row" style="white-space:nowrap;padding-right:0.65em;">Box office</th> <td style="line-height:1.3em;">$65.7 million<sup id="cite_ref-BOM_3-0" class="reference"><a href="#cite_note-BOM-3">[3]</a></sup></td> </tr>