1

I'm trying to parse the following web page link. Code below:

import urllib2 import sys from BeautifulSoup import BeautifulSoup url = 'http://www.etsy.com/teams/list' source = urllib2.urlopen(url) soup = BeautifulSoup(source) print soup.prettify() print len(soup('h3')) #to print the no of occurances of h3 h3s = soup.findAll('h3') #finding the same as above print len(h3s) 

The problem is, it prints 1. while the web page contains atleast 10 'h3'.I couldn't figure out where the problem lies I am using python 2.7 and BeautifulSoup 3.0.7

1
  • For the record, BeautifulSoup 3.2.0 gives me 12 h3s with your code (the last two are in some locale-setting nagging overlay). Commented Aug 31, 2011 at 21:24

1 Answer 1

2

I'd recommend using lxml instead:

>>> import lxml.html >>> doc = lxml.html.parse('http://www.etsy.com/teams/list') >>> len(doc.xpath('//h3')) <<< 10 
Sign up to request clarification or add additional context in comments.

2 Comments

thank you.Will try using lxml and do you have any idea why BeautifulSoup doesn't give proper result for the above case?
No, afaik that should work. All I could suggest is trying a different version of BeautifulSoup, or preferably using lxml instead.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.