2

I had made a script for scraping some data from a web site but it only runs for a few few pages and after that it will stop with this message "'NoneType' object has no attribute 'a'".Another error which appear sometimes is this:

File "scrappy3.py", line 31, in <module> f.writerow(doc_details) File "C:\python\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u015f' in position 251: character maps to <undefined> 

Can You please give me an advice how to resolve those errors.This is my script:

import requests import csv from bs4 import BeautifulSoup import re import time start_time = time.time() page = 1 f = csv.writer(open("./doctors.csv", "w", newline='')) while page <= 5153: url = "http://www.sfatulmedicului.ro/medici/n_s0_c0_h_s0_e0_h0_pagina" + str(page) data = requests.get(url) print ('scraping page ' + str(page)) soup = BeautifulSoup(data.text,"html.parser") for liste in soup.find_all('li',{'class':'clearfix'}): doc_details = [] url_doc = liste.find('a').get('href') for a in liste.find_all('a'): if a.has_attr('name'): doc_details.append(a['name']) data2 = requests.get(url_doc) soup = BeautifulSoup(data2.text,"html.parser") a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a tel_tag=a_tel['onclick'] tel = tel_tag[tel_tag.find("$(this).html("):tel_tag.find(");")].lstrip("$(this).html(") doc_details.append(tel) f.writerow(doc_details) page += 1 print("--- %s seconds ---" % (time.time() - start_time)) 
3
  • Which line are you getting this? Maybe you can post the whole error message with the stack trace. Commented Dec 21, 2017 at 12:58
  • soup.find('div',{'class':'contact_doc add_comment'}) does not find anything, returns None, so the .a fails. Commented Dec 21, 2017 at 13:00
  • @deceze what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too? Commented Dec 21, 2017 at 15:30

2 Answers 2

3

Your error is here

 a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a 

soup.find is obviously not finding the div with the sought class. The return value is None and this by definition has no attributes.

You should check and decide if to continue with further queries in the loop or bail out. For example:

 div_contact = soup.find('div',{'class':'contact_doc add_comment'}) if div_contact is None: continue a_tel = div_contact.a 

You could also try with an try .. except block to cover more cases (like the div actually not having what you expect)

 div_contact = soup.find('div',{'class':'contact_doc add_comment'}) try: a_tel = div_contact.a except AttributeError: continue 

which is in theory more Pythonic. Your choice in any case.

Continuous and continued error checking is part of a program.

Sign up to request clarification or add additional context in comments.

4 Comments

what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too?
You will probably run into the same error. Even if it visually seems to you to be the same div with the same class ... it's obviously not and that means you will have to account for extra cases.
With regards to encoding ... you have to decide what to do with the error, but you can have a look at this other answer for write which covers Unicode encoding when using write: stackoverflow.com/questions/22392377/…
Regarding to encoding i had solved it by adding to the csv.writer encoding='utf-8'
0
resp_find = soup.find('div',{'class':'contact_doc add_comment'}) if resp_find is not None: a_tel = resp_find.a 

You can query if the response of soup.find() is a NoneType object, if not you can apply the .a

Or you ensure that the soup.find() method never give back a NoneType object, so you have to investigate why this method give a NoneType object

1 Comment

what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.