Python error NoneType

Question

I have a Python script that uses BS4 to grab the html of a webpage. Then I locate a specific header field in the html to extract the text. I do this with the following:

r = br.open("http://example.com") html = r.read() r.close() soup = BeautifulSoup(html) # Get the contents of the html tag (h1) that displays results searchResult = soup.find("h1").contents[0] # Get only the number, remove all text if not(searchResult == None): searchResultNum = int(re.match(r'\d+', searchResult).group()) else: searchResultNum = 696969

The actual HTML code doesn't change. It always looks like this:

<div id="resultsCount"> <h1 class="f12">606 Results matched</h1> </div>

The problem is, my script runs fine for maybe 4 minutes (varies) and crashes with:

Traceback (most recent call last): File "C:\Users\Me\Documents\Aptana Studio 3 Workspace\PythonScripts\PythonScripts\setupscript.py", line 109, in <module> searchResultNum = int(re.match(r'\d+', searchResult).group()) AttributeError: 'NoneType' object has no attribute 'group'

I thought I was handling this error. I guess I just do not understand it. Can you help?

Thanks.

cmd · Accepted Answer · 2013-06-14 21:38:16Z

If searchResult does not start with a number re.match(r'\d+', searchResult) will be None and None does not have a group attribute. Also if not(searchResult == None): is kinda bad, use if searchResult:

searchResultNum = 696969 if searchResult: m = re.match(r'\d+', searchResult) if m: searchResultNum = int(m.group())

...the consequence being that he probably should be using re.search() instead of re.match()...
unless he only wants numbers at the beginning, his example text 606 Results matched kind of implies that he does.

Collectives™ on Stack Overflow

Python error NoneType

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related