How to validate a url in Python? (Malformed or not)

Question

I have url from the user and I have to reply with the fetched HTML.

How can I check for the URL to be malformed or not?

For example :

url = 'google' # Malformed url = 'google.com' # Malformed url = 'http://google.com' # Valid url = 'http://google' # Malformed

Just try to read it, if for instance httplib throws an exception, then you'll know it was invalid. Not all well formed urls are valid! — carlpett
– carlpett, Commented Aug 23, 2011 at 12:07
url='http://google' is not malformed. Schema + hostname is always valid. — Viktor Joras
– Viktor Joras, Commented Nov 4, 2018 at 6:53

Asclepius · Accepted Answer · 2021-01-03 04:42:05Z

249

Use the validators package:

>>> import validators >>> validators.url("http://google.com") True >>> validators.url("http://google") ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True}) >>> if not validators.url("http://google"): ... print "not valid" ... not valid >>>

Install it from PyPI with pip (pip install validators).

edited Jan 3, 2021 at 4:42

Asclepius

64.6k20 gold badges188 silver badges164 bronze badges

answered Aug 23, 2015 at 21:46

Jabba

20.9k6 gold badges56 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Devavrata Over a year ago

It will throw error for file urls. Like "file:///users/file.txt"

Tom Over a year ago

Fails for localhost urls

validators.url("http://localhost:8080") ValidationFailure(func=url, args={'public': False, 'value': 'http://localhost:8080'})

Lal Over a year ago

valid for http://www.google, http://google.www.. this is simply checking for http:// and a dot (.) between two words

ivan_pozdeev Over a year ago

The package's validating fn has many arbitrary limitations, so it's a terrible advice to suggest it as a general solution.

Mehdi Zare Over a year ago

This package is not maintained actively

|

dmmfll · Accepted Answer · 2024-03-03 02:08:35Z

161

A True or False version, based on @DMfll answer:

try: # python2 from urlparse import urlparse except ModuleNotFoundError: # python3 from urllib.parse import urlparse a = 'http://www.cwi.nl:80/%7Eguido/Python.html' b = '/data/Python.html' c = 532 d = u'dkakasdkjdjakdjadjfalskdjfalk' e = 'https://stackoverflow.com' def uri_validator(x): try: result = urlparse(x) return all([result.scheme, result.netloc]) except AttributeError: return False print(uri_validator(a)) print(uri_validator(b)) print(uri_validator(c)) print(uri_validator(d)) print(uri_validator(e))

Gives:

True False False False True

edited Mar 3, 2024 at 2:08

dmmfll

2,8642 gold badges38 silver badges43 bronze badges

answered Jun 24, 2016 at 18:37

alemol

8,7482 gold badges28 silver badges32 bronze badges

16 Comments

Marc Maxmeister Over a year ago

I didn't know you could test an if statement with a list of non-None elements. That's helpful. Also +1 for using a built-in module

zondo Over a year ago

This allows everything. It returns True for the string fake or even for a blank string. There will never be any errors because those attributes are always there, and the list will always have a boolean value of True because it contains those attributes. Even if all of the attributes are None, the list will still be non-empty. You need some validation of the attributes because everything passes the way you have it now.

dmmfll Over a year ago

Lists of false objects evaluate to True: print("I am true") if [False, None, 0, '', [], {}] else print("I am false.") prints "I am true." when I run it. [result.scheme, result.netloc, result.path] always evaluates to True. print("I am True") if [] else print("I am False.") prints "I am false." so empty lists are False. The contents of the array needs evaluation with something like the all function.

Jerinaw Over a year ago

Not sure why you would require a path like that. You should remove result.path from the test.

Alexander Fortin Over a year ago

This is good enough for me, thanks. I just added a simple validation for scheme: if not all([result.scheme in ["file", "http", "https"], result.netloc, result.path]):

|

JL Peyret · Accepted Answer · 2024-04-11 18:00:10Z

157

Actually, I think this is the best way.

from django.core.validators import URLValidator from django.core.exceptions import ValidationError val = URLValidator() try: val('httpx://www.google.com') except (ValidationError,) as e: print(e)

edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?

edited Apr 11, 2024 at 18:00

JL Peyret

12.2k4 gold badges67 silver badges96 bronze badges

answered Aug 23, 2011 at 12:10

Drekembe

2,7262 gold badges16 silver badges13 bronze badges

9 Comments

Yugal Jindle Over a year ago

But this will only work in the django environment not otherwise.

user67416 Over a year ago

verify_exists is deprecated. -1

Dukeatcoding Over a year ago

Add: from django.conf import settings settings.configure(DEBUG=False) and remove the verify_exists to keep it working with django 1.5

swdev Over a year ago

@YugalJindle Correct, but stripping it from Django is almost trivial :D. So, I use this method

luckydonald Over a year ago

Note, with django >= 1.5 there is no verify_exists anymore. Also instead of the val variable you can call it like URLValidator()('http://www.google.com')

|

Community · Accepted Answer · 2020-05-26 15:11:19Z

142

django url validation regex (source):

import re regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain... r'localhost|' #localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) print(re.match(regex, "http://www.example.com") is not None) # True print(re.match(regex, "example.com") is not None) # False

edited May 26, 2020 at 15:11

CommunityBot

11 silver badge

answered Aug 23, 2011 at 12:06

cetver

11.9k6 gold badges42 silver badges59 bronze badges

12 Comments

Ruggero Turra Over a year ago

a curiosity... did you add the ftp? Or have I an old django version?

glarrain Over a year ago

@yugal-jindle sitedomain is not a valid url. museum is because .museum is a top-level-domain (ICANN [1] defines them), and not a sitedomain. [1] icann.org

Adam Over a year ago

This one doesn't seem to work with username:[email protected] style URLs

cetver Over a year ago

@cowlinator github.com/django/django/blob/stable/1.3.x/django/core/…

cimnine Over a year ago

This will not work for IPv6 urls, which have the form http://[2001:0DB8::3]:8080/index.php?valid=true#result

|

Jonathan Prieto-Cubides · Accepted Answer · 2018-10-05 09:32:16Z

39

Nowadays, I use the following, based on the Padam's answer:

$ python --version Python 3.6.5

And this is how it looks:

from urllib.parse import urlparse def is_url(url): try: result = urlparse(url) return all([result.scheme, result.netloc]) except ValueError: return False

Just use is_url("http://www.asdf.com").

Hope it helps!

edited Oct 5, 2018 at 9:32

answered Sep 22, 2018 at 10:55

Jonathan Prieto-Cubides

2,8872 gold badges20 silver badges18 bronze badges

4 Comments

Gaslight Deceive Subvert Over a year ago

It fails in case the domain name begins with a dash, which is not valid. tools.ietf.org/html/rfc952

ingyhere Over a year ago

This is only good to split up components in the special case that the URI is known to NOT be malformed. As I replied earlier to the other similar answer, this validates malformed URIs, like https://https://https://www.foo.bar.

Jesuisme Over a year ago

As of Python 3.7.6, I tested this logic with "https://-wee.com" and it worked

Josiah Coad Over a year ago

www.tiktok.com/@outlikethevapors is a valid url but this says that its not

dmmfll · Accepted Answer · 2024-03-03 02:34:33Z

I landed on this page trying to figure out a sane way to validate strings as "valid" urls. I share here my solution using python3. No extra libraries required.

See https://docs.python.org/2/library/urlparse.html if you are using python2.

See https://docs.python.org/3.0/library/urllib.parse.html if you are using python3 as I am.

import urllib from pprint import pprint invalid_url = 'dkakasdkjdjakdjadjfalskdjfalk' valid_url = 'https://stackoverflow.com' tokens = [urllib.parse.urlparse(url) for url in (invalid_url, valid_url)] for token in tokens: pprint(token) min_attributes = ('scheme', 'netloc') # add attrs to your liking for token in tokens: if all(getattr(token, attr) for attr in min_attributes) is False: error = "'{url}' string has no scheme or netloc.".format(url=token.geturl()) print(error) else: print("'{url}' is probably a valid url.".format(url=token.geturl()))

ParseResult(scheme='', netloc='', path='dkakasdkjdjakdjadjfalskdjfalk', params='', query='', fragment='')

ParseResult(scheme='https', netloc='stackoverflow.com', path='', params='', query='', fragment='')

'dkakasdkjdjakdjadjfalskdjfalk' string has no scheme or netloc.

'https://stackoverflow.com' is probably a valid url.

Here is a more concise function:

from urllib.parse import urlparse min_attributes = ('scheme', 'netloc') def is_valid(url, qualifying=min_attributes): tokens = urlparse(url) return all(getattr(tokens, qualifying_attr) for qualifying_attr in qualifying)

Here is a useage example. I prefer to do any exception handling outside of a function.

my_list = [ "http://www.cwi.nl:80/%7Eguido/Python.html", "/data/Python.html", 532, type("FooObject", (), {"decode": None})(), "dkakasdkjdjakdjadjfalskdjfalk", "https://stackoverflow.com", ] for item in my_list: try: print(f"{item} is valid: {is_valid(item)}") except (AttributeError, TypeError) as e: print(e)

OUTPUT:

http://www.cwi.nl:80/%7Eguido/Python.html is valid: True

/data/Python.html is valid: False

'int' object has no attribute 'decode'

'NoneType' object is not callable

dkakasdkjdjakdjadjfalskdjfalk is valid: False

https://stackoverflow.com is valid: True

andrew cooke · Accepted Answer · 2012-08-30 17:29:20Z

note - lepl is no longer supported, sorry (you're welcome to use it, and i think the code below works, but it's not going to get updates).

rfc 3696 http://www.faqs.org/rfcs/rfc3696.html defines how to do this (for http urls and email). i implemented its recommendations in python using lepl (a parser library). see http://acooke.org/lepl/rfc3696.html

to use:

> easy_install lepl ... > python ... >>> from lepl.apps.rfc3696 import HttpUrl >>> validator = HttpUrl() >>> validator('google') False >>> validator('http://google') False >>> validator('http://google.com') True

you haven't forked the code and implemented them? it's open source.
lepl is now discontinued by the author acooke.org/lepl/discontinued.html EDIT: heh, just realized that you are the author

faruk13 · Accepted Answer · 2019-01-19 09:17:46Z

EDIT

As pointed out by @Kwame , the below code does validate the url even if the .com or .co etc are not present.

also pointed out by @Blaise, URLs like https://www.google is a valid URL and you need to do a DNS check for checking if it resolves or not, separately.

This is simple and works:

So min_attr contains the basic set of strings that needs to be present to define the validity of a URL, i.e http:// part and google.com part.

urlparse.scheme stores http:// and

urlparse.netloc store the domain name google.com

from urlparse import urlparse def url_check(url): min_attr = ('scheme' , 'netloc') try: result = urlparse(url) if all([result.scheme, result.netloc]): return True else: return False except: return False

all() returns true if all the variables inside it return true. So if result.scheme and result.netloc is present i.e. has some value then the URL is valid and hence returns True.

Oh , Nice catch .. I guess I have to take my code back. What do you prefer , are there any other options except regex.
https://www.google is a valid URL. It may not actually resolve, but if you care about that you need to do a DNS check.

0x7633 · Accepted Answer · 2024-01-13 11:11:30Z

Here's a regex solution since top voted regex doesn't work for weird cases like top-level domain. Some test cases down below.

regex = re.compile( r"(\w+://)?" # protocol (optional) r"(\w+\.)?" # host (optional) r"(([\w-]+)\.(\w+))" # domain r"(\.\w+)*" # top-level domain (optional, can have > 1) r"([\w\-\._\~/]*)*(?<!\.)" # path, params, anchors, etc. (optional) )

cases = [ "http://www.google.com", "https://www.google.com", "http://google.com", "https://google.com", "www.google.com", "google.com", "http://www.google.com/~as_db3.2123/134-1a", "https://www.google.com/~as_db3.2123/134-1a", "http://google.com/~as_db3.2123/134-1a", "https://google.com/~as_db3.2123/134-1a", "www.google.com/~as_db3.2123/134-1a", "google.com/~as_db3.2123/134-1a", # .co.uk top level "http://www.google.co.uk", "https://www.google.co.uk", "http://google.co.uk", "https://google.co.uk", "www.google.co.uk", "google.co.uk", "http://www.google.co.uk/~as_db3.2123/134-1a", "https://www.google.co.uk/~as_db3.2123/134-1a", "http://google.co.uk/~as_db3.2123/134-1a", "https://google.co.uk/~as_db3.2123/134-1a", "www.google.co.uk/~as_db3.2123/134-1a", "google.co.uk/~as_db3.2123/134-1a", "https://...", "https://..", "https://.", "https://.google.com", "https://..google.com", "https://...google.com", "https://.google..com", "https://.google...com", "https://...google..com", "https://...google...com", ".google.com", ".google.co.", "https://google.co." ] for c in cases: if regex.match(c): print(c, regex.match(c).span()[1] - regex.match(c).span()[0] == len(c)) else: print(c, False)

Edit: Added hyphen to domain as suggested by nickh.

error in last line fixed: print(c, x.span()[1] - x.span()[0] == len(c) if (x := regex.match(c)) else False)
Thanks Miguel, but I would like to warn others who do not use Python 3.8+ since ":=" is not valid for former versions.
It doesn't match domains with hyphens, e.g api-example.com Consider using (\w+://)?(\w+\.)?(([\w-]+)\.(\w+))(\.\w+)*([\w\-\._\~/]*)*(?<!\.)
it also doesn't match a single word, for example "fred". it gives the error AttributeError: 'NoneType' object has no attribute 'span'
@colin0117, this shouldn't match a single word. I recommend checking for that edge case in your code.

winklerrr · Accepted Answer · 2019-04-24 13:02:33Z

Validate URL with `urllib` and Django-like regex

The Django URL validation regex was actually pretty good but I needed to tweak it a little bit for my use case. Feel free to adapt it to yours!

Python 3.7

import re import urllib # Check https://regex101.com/r/A326u1/5 for reference DOMAIN_FORMAT = re.compile( r"(?:^(\w{1,255}):(.{1,255})@|^)" # http basic authentication [optional] r"(?:(?:(?=\S{0,253}(?:$|:))" # check full domain length to be less than or equal to 253 (starting after http basic auth, stopping before port) r"((?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+" # check for at least one subdomain (maximum length per subdomain: 63 characters), dashes in between allowed r"(?:[a-z0-9]{1,63})))" # check for top level domain, no dashes allowed r"|localhost)" # accept also "localhost" only r"(:\d{1,5})?", # port [optional] re.IGNORECASE ) SCHEME_FORMAT = re.compile( r"^(http|hxxp|ftp|fxp)s?$", # scheme: http(s) or ftp(s) re.IGNORECASE ) def validate_url(url: str): url = url.strip() if not url: raise Exception("No URL specified") if len(url) > 2048: raise Exception("URL exceeds its maximum length of 2048 characters (given length={})".format(len(url))) result = urllib.parse.urlparse(url) scheme = result.scheme domain = result.netloc if not scheme: raise Exception("No URL scheme specified") if not re.fullmatch(SCHEME_FORMAT, scheme): raise Exception("URL scheme must either be http(s) or ftp(s) (given scheme={})".format(scheme)) if not domain: raise Exception("No URL domain specified") if not re.fullmatch(DOMAIN_FORMAT, domain): raise Exception("URL domain malformed (domain={})".format(domain)) return url

Explanation

The code only validates the scheme and netloc part of a given URL. (To do this properly, I split the URL with urllib.parse.urlparse() in the two according parts which are then matched with the corresponding regex terms.)

The netloc part stops before the first occurrence of a slash /, so port numbers are still part of the netloc, e.g.:

https://www.google.com:80/search?q=python ^^^^^ ^^^^^^^^^^^^^^^^^ | | | +-- netloc (aka "domain" in my code) +-- scheme

IPv4 addresses are also validated

IPv6 Support

If you want the URL validator to also work with IPv6 addresses, do the following:

Add is_valid_ipv6(ip) from Markus Jarderot's answer, which has a really good IPv6 validator regex
Add and not is_valid_ipv6(domain) to the last if

Examples

Here are some examples of the regex for the netloc (aka domain) part in action:

IPv4 and alphanumeric: https://regex101.com/r/A326u1/5
IPv6: https://regex101.com/r/lKIIgq/1 (with the regex from Markus Jarderot's answer)

dxtr_brz · Accepted Answer · 2023-01-23 14:23:04Z

Pydantic could be used to do that. I'm not very used to it so I can't say about it's limitations. It is an option thou and no one suggested it.

I have seen that many people questioned about ftp and files URL in previous answers so I recommend to get known to the documentation as Pydantic have many types for validation as FileUrl, AnyUrl and even database url types.

A simplistic usage example:

from requests import get, HTTPError, ConnectionError from pydantic import BaseModel, AnyHttpUrl, ValidationError class MyConfModel(BaseModel): URI: AnyHttpUrl try: myAddress = MyConfModel(URI = "http://myurl.com/") req = get(myAddress.URI, verify=False) print(myAddress.URI) except(ValidationError): print('Invalid destination')

Pydantic also raises exceptions (pydantic.ValidationError) that can be used to handle errors.

I have teste it with these patterns:

http://localhost (pass)
http://localhost:8080 (pass)
http://example.com (pass)
http://user:[email protected] (pass)
http://_example.com (pass)
http://&example.com (fails)
http://-example.com (fails)

Сергей Дорофий · Accepted Answer · 2020-05-10 17:52:55Z

All of the above solutions recognize a string like "http://www.google.com/path,www.yahoo.com/path" as valid. This solution always works as it should

import re # URL-link validation ip_middle_octet = u"(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5]))" ip_last_octet = u"(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))" URL_PATTERN = re.compile( u"^" # protocol identifier u"(?:(?:https?|ftp|rtsp|rtp|mmp)://)" # user:pass authentication u"(?:\S+(?::\S*)?@)?" u"(?:" u"(?P<private_ip>" # IP address exclusion # private & local networks u"(?:localhost)|" u"(?:(?:10|127)" + ip_middle_octet + u"{2}" + ip_last_octet + u")|" u"(?:(?:169\.254|192\.168)" + ip_middle_octet + ip_last_octet + u")|" u"(?:172\.(?:1[6-9]|2\d|3[0-1])" + ip_middle_octet + ip_last_octet + u"))" u"|" # IP address dotted notation octets # excludes loopback network 0.0.0.0 # excludes reserved space >= 224.0.0.0 # excludes network & broadcast addresses # (first & last IP address of each class) u"(?P<public_ip>" u"(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])" u"" + ip_middle_octet + u"{2}" u"" + ip_last_octet + u")" u"|" # host name u"(?:(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)" # domain name u"(?:\.(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)*" # TLD identifier u"(?:\.(?:[a-z\u00a1-\uffff]{2,}))" u")" # port number u"(?::\d{2,5})?" # resource path u"(?:/\S*)?" # query string u"(?:\?\S*)?" u"$", re.UNICODE | re.IGNORECASE ) def url_validate(url): """ URL string validation """ return re.compile(URL_PATTERN).match(url)

google.com/path,www.yahoo.com/path is valid. See RFC 3986: a path is made of segments which are built from pchars which may be sub-delims one of which is ",".
Yes, the symbol "," is included in the list of acceptable sub-delims, but the line from my example, even in a terrible dream, cannot be a valid url =)
@СергейДорофий why not? If it is valid according to the grammar for an URI it is valid URI by definition, not sure I follow why you say it can't be valid if it contains valid characters.
this is a perfect solution. other solutions doesn't validate all. urlparse doesnt recognise even if semicolon is missing after http eg "https//" but this one cates all. good solution

Anatoly Alekseev · Accepted Answer · 2021-01-14 00:27:03Z

Not directly relevant, but often it's required to identify whether some token CAN be a url or not, not necessarily 100% correctly formed (ie, https part omitted and so on). I've read this post and did not find the solution, so I am posting my own here for the sake of completeness.

def get_domain_suffixes(): import requests res=requests.get('https://publicsuffix.org/list/public_suffix_list.dat') lst=set() for line in res.text.split('\n'): if not line.startswith('//'): domains=line.split('.') cand=domains[-1] if cand: lst.add('.'+cand) return tuple(sorted(lst)) domain_suffixes=get_domain_suffixes() def reminds_url(txt:str): """ >>> reminds_url('yandex.ru.com/somepath') True """ ltext=txt.lower().split('/')[0] return ltext.startswith(('http','www','ftp')) or ltext.endswith(domain_suffixes)

I needed a stricter validator than what most answers implemented - correctly formed AND with a valid TDL. You answer gave me the neccessary second part, which I combined with a regex. Thank you.

Anthony · Accepted Answer · 2023-03-31 21:23:50Z

Use this example to conduct your own meaning of an "URL", and apply it everywhere in your code:

# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE # Version 2, December 2004 # # Copyright (C) 2004 Sam Hocevar <[email protected]> # # Everyone is permitted to copy and distribute verbatim or modified # copies of this license document, and changing it is allowed as long # as the name is changed. # # DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE # TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION # # 0. You just DO WHAT THE FUCK YOU WANT TO. # # Copyright © 2023 Anthony [email protected] # # This work is free. You can redistribute it and/or modify it under the # terms of the Do What The Fuck You Want To Public License, Version 2, # as published by Sam Hocevar. See the LICENSE file for more details. import operator as op from urllib.parse import ( ParseResult, urlparse, ) import attrs import pytest from phantom import Phantom from phantom.fn import compose2 def is_url_address(value: str) -> bool: return any(urlparse(value)) class URL(str, Phantom, predicate=is_url_address): pass # presume that an empty URL is a nonsense def test_empty_url(): with pytest.raises(TypeError, match="Could not parse .* from ''"): URL.parse("") # is it enough now? def test_url(): assert URL.parse("http://") scheme_and_netloc = op.attrgetter("scheme", "netloc") def has_scheme_and_netloc(value: ParseResult) -> bool: return all(scheme_and_netloc(value)) # need a bit of FP magic 🧙 here class ReachableURL(URL, predicate=compose2(has_scheme_and_netloc, urlparse)): pass def test_empty_reachable_url(): with pytest.raises(TypeError, match="Could not parse .* from ''"): ReachableURL.parse("") # but "empty" for an URL is not just "empty string" def test_reachable_url_probably_wrong_host(): assert ReachableURL.parse("http://example") def test_reachable_url(): assert ReachableURL.parse("http://example.com") def test_reachable_url_without_scheme(): with pytest.raises(TypeError, match="Could not parse .* from 'example.com'"): ReachableURL.parse("example.com") # constructor works too def test_constructor(): assert ReachableURL("http://example.com") # but it *is* `str` def test_url_is_str(): assert isinstance(ReachableURL("http://example.com"), str) # now we can write plain old classes utilizing our `URL` and `ReachableURL` # I'm lazy... @attrs.define class Person: homepage: ReachableURL def test_person(): person = Person(homepage=ReachableURL("https://example.com/index.html")) assert person.homepage def greet(person: Person) -> None: print(f"Hello! I will definitely visit you at {person.homepage}.") if __name__ == "__main__": greet(Person(homepage=ReachableURL.parse("tg://resolve?username")))

It will not be surprising if an URL RFC turns out to be Turing-complete!

Roman M. · Accepted Answer · 2024-07-29 15:17:03Z

I've expanded cetver's answer with this answer's regex to include ipv6 addresses:

 def validate_url(url:str) -> bool: regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain... r'localhost|' #localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip r'(?::\d+)?' # optional port r'([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|' # 1:2:3:4:5:6:7:8 r'([0-9a-fA-F]{1,4}:){1,7}:|' # 1:: 1:2:3:4:5:6:7:: r'([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|' # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8 r'([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|' # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8 r'([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|' # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8 r'([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|' # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8 r'([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|' # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8 r'[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|' # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8 r':((:[0-9a-fA-F]{1,4}){1,7}|:)|' # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 :: r'fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|' # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index) r'::(ffff(:0{1,4}){0,1}:){0,1}' r'((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}' r'(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|' # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 r'(IPv4-mapped IPv6 addresses and IPv4-translated addresses) r'([0-9a-fA-F]{1,4}:){1,4}:' r'((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}' r'(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])' # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address) r'(?:/?|[/?]\S+)$', re.IGNORECASE) return (re.match(regex, url) is not None)

jmoerdyk · Accepted Answer · 2023-01-23 20:08:19Z

-1

from urllib.parse import urlparse def is_valid_url(url): try: result = urlparse(url) return all([result.scheme, result.netloc]) except ValueError: return False url = 'http://google.com' if is_valid_url(url): print('Valid URL') else: print('Malformed URL')

edited Jan 23, 2023 at 20:08

jmoerdyk

5,5468 gold badges43 silver badges58 bronze badges

answered Jan 23, 2023 at 14:53

Hà Nguyễn

91 bronze badge

2 Comments

Yunnosch Over a year ago

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.

shamnad sherief Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Manuel Marcus · Accepted Answer · 2023-08-24 07:27:32Z

This code uses socket, so you don't need to install it, because it is a built in library. It tries to connect to the input url.

import socket def isValid(url): #connect to the host -- tells us if the host is actually reachable try: socket.create_connection((url, 80)) return True except socket.gaierror: return False except OSError: return False A socket.gaierror occurs if the url is not valid, and an OSErrors occurs when you are not connected.

It returns True for both "https://www.google.com" and "google.com".

If it is a problem, you can simply use this code:

import socket def isValid(url): if url.startswith("https://www.") or url.startswith("http://www."): try: socket.create_connection((url, 80)) return True except socket.gaierror: return False except OSError: return False else: return False

pmiguelpinto90 · Accepted Answer · 2021-12-15 11:38:17Z

Function based on Dominic Tarro answer:

import re def is_url(x): return bool(re.match( r"(https?|ftp)://" # protocol r"(\w+(\-\w+)*\.)?" # host (optional) r"((\w+(\-\w+)*)\.(\w+))" # domain r"(\.\w+)*" # top-level domain (optional, can have > 1) r"([\w\-\._\~/]*)*(?<!\.)" # path, params, anchors, etc. (optional) , x))

Collectives™ on Stack Overflow

How to validate a url in Python? (Malformed or not)

18 Answers 18

12 Comments

16 Comments

9 Comments

django url validation regex (source):

12 Comments

4 Comments

1 Comment

4 Comments

3 Comments

7 Comments

Validate URL with `urllib` and Django-like regex

Python 3.7

Explanation

IPv6 Support

Examples

Comments

Comments

4 Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

18 Answers 18

12 Comments

16 Comments

9 Comments

django url validation regex (source):

12 Comments

4 Comments

1 Comment

4 Comments

3 Comments

7 Comments

Validate URL with urllib and Django-like regex

Python 3.7

Explanation

IPv6 Support

Examples

Comments

Comments

4 Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Linked

Related

Validate URL with `urllib` and Django-like regex