I have a simple screen scraping routine that gets an HTML page via BeautifulSoup, using a proxy crawling service (Scrapinghub):
def make_soup(self,current_url): soup = None r = requests.get(current_url, proxies=self.proxies, auth=self.proxy_auth, verify='static/crawlera-ca.crt') if r.status_code == 200: soup = bs4.BeautifulSoup(r.text, "html.parser") if soup: return soup return False When I run it on an http:// site it works properly.
When I run it on an https:// site it returns this:
Traceback (most recent call last): File "/home/danny/Documents/virtualenvs/AskArbyEnv/lib/python3.5/site-packages/requests/packages/urllib3/util/ssl_.py", line 295, in ssl_wrap_socket context.load_verify_locations(ca_certs, ca_cert_dir) FileNotFoundError: [Errno 2] No such file or directory Even weirder is that it works when I run it in a unit test accessing the same https:// site.
The only thing that changes between the unit test and the running code is the search terms that I append to the URL that I pass to 'make_soup'. Each resulting URL is well-formed, and I can access both of them in the browser.
This makes me think that it can't be to do with missing SSL certificates. So why does it seem to be complaining that it can't find certificate files?