2

So, I learned how Web Scraping works a few days ago and I was messing around today. I wanted to know how I could test if a page exists/doesn't exist. So, I looked it up and I found Python check if website exists. I'm using the requests module and I got this code from the answers:

import requests request = requests.get('http://www.example.com') if request.status_code == 200: print('Web site exists') else: print('Web site does not exist') 

I tried it out, and since example.com exists, it printed "Web site exists". However, I tried something I was sure wouldn't exist, like examplewwwwwww.com and it gave me this error. Why is it doing this and how can I keep it from printing out an error (and instead saying that the website does not exist)?

3
  • As that page indicates, it throws a ConnectionError stackoverflow.com/questions/16778435/… Commented Feb 7, 2018 at 14:10
  • There's no server there to give you a status. Read the comments of that link you posted and instead use something like try... except ConnectionError. Commented Feb 7, 2018 at 14:14
  • some sites block you thinking this is a scraping attempt, knowing you're not a real browser due to your user agent an other features. This explains why some urls rejected with 404 actually DO work in the browser Commented Feb 6, 2021 at 17:10

4 Answers 4

5

You can use try/except like this:

import requests from requests.exceptions import ConnectionError try: request = requests.get('http://www.example.com') except ConnectionError: print('Web site does not exist') else: print('Web site exists') 
Sign up to request clarification or add additional context in comments.

Comments

2

Just to list my way of doing it, maybe it can be of value for someone:

 try: response = requests.get('https://github.com') if response.ok: ready = 1 break except requests.exceptions.RequestException: print("Website not availabe...") 

Comments

1

Well you getting the error because the url you want to get is invalid, however you can easily check this with a try - except block as this one:

import requests from requests.exceptions import MissingSchema try: request = requests.get('examplewwwwwww.com') except MissingSchema: print('The provided URL is invalid.') 

Comments

0

You have to enclose request.get call with try/except and handle various exceptions that might arise, one of which is ConnectionError.

You get this because having response status_code not equal to 200 and not being able to connect to desired HTTP address are two different things.

Here are the exceptions that you might encounter when making requests with requests library.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.