I have a list of ~250K urls for an API that I need to retrieve.
I have made a class using grequests that works exactly how I want it to except, I think it is working too fast because after running through the entire list of URLs I get error:
Problem: url: HTTPSConnectionPool(host='url', port=123): Max retries exceeded with url: url (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x38f466c18>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',)) Code so far:
import grequests lst = ['url','url2',url3'] class Test: def __init__(self): self.urls = lst def exception(self, request, exception): print ("Problem: {}: {}".format(request.url, exception)) def async(self): return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5) def collate_responses(self, results): return [x.text for x in results] test = Test() #here we collect the results returned by the async function results = test.async() How can slow the code down a bit to prevent the 'Max retries error'? Or even better how can I chunk the list I have and pass the URLs in chunks?
Using python3.6 on mac.
edit:
question is not duplicate, have to pass in many URLs to the same endpoint.