3

I have currently been working with my own "retry" function where I would like to retry until the requests works. There is some scenarios where if I hit any 5xx status, I should retry with a long delays.

If I hit specific status code e.g. 200 or 404, it should not raise the status code else raise it.

So I have done something like this:

import time import requests from bs4 import BeautifulSoup from requests import ( RequestException, Timeout ) def do_request(): try: # There is some scenarios where I would use my own proxies by doing # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx')) while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500: print("sleeping") time.sleep(20) if response.status_code not in (200, 404): response.raise_for_status() print("Successful requests!") soup = BeautifulSoup(response.text, 'html.parser') for link in soup.find_all("a", {"class": "media__link"}): yield link.get('href') except Timeout as err: print(f"Retry due to timed out: {err}") except RequestException as err: raise RequestException("Unexpected request error") # ----------------------------------------------------# if __name__ == '__main__': for found_links in do_request(): print(found_links) 

The problem for me now is that I have on purpose set the timeout to 0.1 to trigger the exception Timeout to happend and what I want it to happend here is that it should retry the request again once it hits it.

Currently it is stopping there and I wonder what should I do to be able to retry the requests again if it hits a timeout where I do not raise the error?

3 Answers 3

2

You can recursively call the function from itself in your case, although be careful about unexpected edge cases:

def do_request(retry: int = 3): try: # There is some scenarios where I would use my own proxies by doing # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx')) while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500: print("sleeping") time.sleep(20) if response.status_code not in (200, 404): response.raise_for_status() print("Successful requests!") soup = BeautifulSoup(response.text, 'html.parser') for link in soup.find_all("a", {"class": "media__link"}): yield link.get('href') except Timeout as err: if retry: print(f"Retry due to timed out: {err}") yield from do_request(retry=retry - 1) else: raise except RequestException as err: raise RequestException("Unexpected request error") 

This will attempt 3 times (or as many as you set in the parameter) until retry is equal to 0 or until another error is encountered

Sign up to request clarification or add additional context in comments.

4 Comments

Hello! Smart way! I just tested it and sorry to say, it looks like it still stops at your if retry, where it doesn't seems like it retrying again after first round
@ProtractorNewbie I've just noticed you're using yield to get values so i've corrected for that and edited my answer, can you try again with the new version?
That did the job! :) Now just out of curiosity, if I had more than just retry in the parameter for do_request e.g. URL, I assume I would need to do something like yield from do_request(retry=retry - 1, url=URL) in that case?
yes! it should work. or you could use *args, **kwargs if you have many more parameters (but that's a subject for another question if you're not familiar with that)
2

I would put it in a while loop and break the loop when the action is achieved.

Sample:

def do_request(): while True: try: # There is some scenarios where I would use my own proxies by doing # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx')) while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500: print("sleeping") time.sleep(20) if response.status_code not in (200, 404): response.raise_for_status() print("Successful requests!") soup = BeautifulSoup(response.text, 'html.parser') for link in soup.find_all("a", {"class": "media__link"}): yield link.get('href') break except Timeout as err: print(f"Retry due to timed out: {err}") except RequestException as err: raise RequestException("Unexpected request error") 

You can also add time.sleep(0.1) between each trial.

Comments

1

The package tenacity tackles all kinds of retrying problems gracefully.

For your problem, simply add a decorator like this:

@retry(retry=retry_if_exception_type(Timeout)) def do_request(): while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500: print("sleeping") time.sleep(20) if response.status_code not in (200, 404): response.raise_for_status() print("Successful requests!") soup = BeautifulSoup(response.text, 'html.parser') for link in soup.find_all("a", {"class": "media__link"}): yield link.get('href') 

3 Comments

Hello! I just tested your code and it seems like nothing happens... It just exits the code when running this script. Have you checked in your end? Is it also possible to print out what kind of exception did happend?
Just tested it again with throwing HTTPError instead of Timeout, looks like it does not retry afterall... You sure it is working? :)
I fixed an indent mistake. Does it work now?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.