I have a Python script that uses BS4 to grab the html of a webpage. Then I locate a specific header field in the html to extract the text. I do this with the following:
r = br.open("http://example.com") html = r.read() r.close() soup = BeautifulSoup(html) # Get the contents of the html tag (h1) that displays results searchResult = soup.find("h1").contents[0] # Get only the number, remove all text if not(searchResult == None): searchResultNum = int(re.match(r'\d+', searchResult).group()) else: searchResultNum = 696969 The actual HTML code doesn't change. It always looks like this:
<div id="resultsCount"> <h1 class="f12">606 Results matched</h1> </div> The problem is, my script runs fine for maybe 4 minutes (varies) and crashes with:
Traceback (most recent call last): File "C:\Users\Me\Documents\Aptana Studio 3 Workspace\PythonScripts\PythonScripts\setupscript.py", line 109, in <module> searchResultNum = int(re.match(r'\d+', searchResult).group()) AttributeError: 'NoneType' object has no attribute 'group' I thought I was handling this error. I guess I just do not understand it. Can you help?
Thanks.