0

I am scrapping a web site, but sometimes the laptop lost the connection, and I got (obviously) a requests.exceptions.ConnectionError. Which is the right (or most elegant?) way of recover from this error? I mean: I don't want the program to stop, but retry the connection, maybe some seconds later? This is my code, but I got the feeling is not correct:

def make_soup(session,url): try: n = randint(1, MAX_NAPTIME) sleep(n) response = session.get(url) except requests.exceptions.ConnectionError as req_ce: error_msg = req_ce.args[0].reason.strerror print "Error: %s con la url %s" % (eror_msg, url) session = logout(session) n = randint(MIN_SLEEPTIME, MAX_SLEEPTIME) sleep(n) session = login(session) response = session.get(url) soup = BeautifulSoup(response.text) return soup 

Any ideas?

Note that I need a session to scrap this pages, so, I think that the login (i.e. login again to the site, after a logout) could be cause troubles

2 Answers 2

4

So why not something like

import requests import time def retry(cooloff=5, exc_type=None): if not exc_type: exc_type = [requests.exceptions.ConnectionError] def real_decorator(function): def wrapper(*args, **kwargs): while True: try: return function(*args, **kwargs) except Exception as e: if e.__class__ in exc_type: print "failed (?)" time.sleep(cooloff) else: raise e return wrapper return real_decorator 

Which is a decorator that allows you to call any function until it succeeds. e.g.

@retry(exc_type=[ZeroDivisionError]) def test(): return 1/0 print test() 

Which will just print "failed (y)" every 5 seconds until the end of time (or until the laws of math change)

Sign up to request clarification or add additional context in comments.

Comments

0

Is it really needed to logout and relogin into your session? I'd just retry the connection the same way:

def make_soup(session,url): success = False response = None for attempt in range(1, MAXTRIES): try: response = session.get(url) # If session.get succeeded, we break out of the # for loop after setting a success flag success = True break except requests.exceptions.ConnectionError as req_ce: error_msg = req_ce.args[0].reason.strerror print "Error: %s con la url %s" % (error_msg, url) print " Attempt %s of %s" % (attempt, MAXTRIES) sleep(randint(MIN_SLEEPTIME, MAX_SLEEPTIME)) # Figure out if we were successful. # Note it may not be needed to have a flag, you can maybe just # check the value of response here. if not success: print "Couldn't get it after retrying many times" return None #Once we get here, we know we got a good response soup = BeautifulSoup(response.text) return soup 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.