0

So I'm using the below code to scrape a CSV of Business Names & Website domains (about 10,000) for "mailto:" links and trying to save those to a CSV when mailto links are found. But occasionally I run into "temporary dns lookup failur" and "connection time out" errors.

I need help figuring out how to go about having it "Skip" when the request function throws these errors (any error) and just continue down the list.

import csv import requests from bs4 import BeautifulSoup import numpy as np results = [] agency_names = ['Agency Name'] agency_websites = ['Website'] agency_emails = ['Email Address'] with open('agencies_clean.csv') as csvfile: reader = csv.reader(csvfile) # change contents to floats count = 0; for row in reader: # each row is a list if count != 0: if row[1] != "": print("working on "+row[1]+"...") page = requests.get('http://'+row[1]) soup = BeautifulSoup(page.content, "html.parser") mailtos = soup.select('a[href^=mailto]') if mailtos: agency_names.append(row[0]) agency_websites.append(row[1]) agency_emails.append(mailtos[0].text) print('Completed[x] Company:' + row[0] + 'Email: '+mailtos[0].text) count=count+1 np.savetxt('scrapes/agencies_w_emails.csv', [p for p in zip(agency_names, agency_websites)], delimiter=',', fmt='%s') 

2 Answers 2

1

You're probably looking for something similar to:

for row in reader: try: // your verification code here except DNSException: continue 

I'm not sure about the exception name you should use, I think you can read it from the python interpreter's output and replace DNSException with it. A part from that, the main idea is to use continue to pass to the next element of the iteration.

A generic exceptions handler like this:

try: // something except: pass 

is usually not a good coding practice.

Sign up to request clarification or add additional context in comments.

Comments

0

This might work. The print in except are optional.

try: mailtos: agency_names.append(row[0]) agency_websites.append(row[1]) agency_emails.append(mailtos[0].text) print('Completed[x] Company:' + row[0] + 'Email: '+mailtos[0].text) except: print('email error') continue 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.