2

I'd like to scrape with python from this website: http://www.ssa.gov/oact/babynames/#ht=1

At the bottom, under the table of names, there are three tabs. I'm looking to POST to the form under the tab "Popular Names by Birth Year."

Here's my code:

from bs4 import BeautifulSoup import requests url = "http://www.ssa.gov/oact/babynames/" payload = { 'year': 2010, 'top': 50 } r = requests.post(url, data=payload) # returns status 200 soup = BeautifulSoup(r.text) print soup.prettify() 

This only returns the original page, not the generated page I'm looking for.

What could be the reason it's not returning the generated page?

THANKS!

1 Answer 1

2

You need to change the url for your POST request to http://www.ssa.gov/cgi-bin/popularnames.cgi.

Demo:

>>> from bs4 import BeautifulSoup >>> import requests >>> url = "http://www.ssa.gov/cgi-bin/popularnames.cgi" >>> payload = { ... 'year': 2010, ... 'top': 50 ... } >>> r = requests.post(url, data=payload) >>> soup = BeautifulSoup(r.text) >>> table = soup.find('table', summary='Popularity for top 50') >>> for row in table.find_all('tr')[1:4]: ... print [td.text for td in row.find_all('td')] ... [u'1', u'Jacob', u'Isabella'] [u'2', u'Ethan', u'Sophia'] [u'3', u'Michael', u'Emma'] 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.