27

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this

 var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?" var txtfield = $('#msg').val() /*this is a textarea*/ if ( urlpattern.test(txtfield) ){ //do something about it } 

EDIT:

So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error

 "Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[w-@?^=%&/~+#])?/: Range out of order in character class" 

for the following code:

var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?' ); 
3
  • Why do you exclude FTPS? Commented Dec 31, 2014 at 11:30
  • I really only needed http/https so in my case I couldve left out ftp as well too Commented Jan 27, 2015 at 0:56
  • This is essentially a duplicate of How to replace plain URLs with links?, which explains why regular expressions are a bad idea for this kind of task. Commented Oct 11, 2016 at 3:57

8 Answers 8

75
+50

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

In addition, \+ and \@ in a character class are indeed interpreted as + and @ respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

I would recommend the following regex for your purposes:

(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])? 

this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?") 

or by directly specifying a regex literal, using the // quoting method:

var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/ 

The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.

I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!

Edit

As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])? #------changed----here-------------^ 

<End Edit>

Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

Sign up to request clarification or add additional context in comments.

11 Comments

regular-expressions-info is broken. Put "dot" instead of a dash in href.
one more thing: the correct syntax would be ... = new RegExp(...) instead of ... = new Regexp(...). Thanks anyway for the great answer!
This breaks on URLs with no dots in the host. For example, http://localhost/foo/bar.txt. To fix it, change (\.[\w-]+)+ to (\.[\w-]+)*.
The question is general, and this answer has gotten a lot of recognition. Someone (not the OP) used this code, and it caused a real bug in some code I was debugging… so breaks isn't entirely relative. It's worth making the answer as canonical as possible.
I highly recommend this as a supplemental resource: mathiasbynens.be/demo/url-regex
|
6

Goal: Extract & Parse all URIs found in an input string

2025 Note: patterns updated up for a write-up article on my site (Added in case anyone wants to learn more about my techniques for building & testing a 100+ char regex.)

Updated on 2024/08, 2021/06, and 2020/11!

Note: This isn't meant to be RFC compliant; NOT meant for validation!

Parsing must isolate protocol, domain, path, query and hash.

2024-12-20 simpler, may include trailing punctuation (114 chars, 👨‍🍳)

([-.a-z0-9]+:\/{1,3})([^-/\.[\](|)\s?][^`\/\s\]?]+)([-_a-z0-9!@$%^&*()=+;/~\.]*)[?]?([^#\s`?]*)[#]?([^#\s'"`\.]*) 

2024-12-29 very accurate, uses look-aheads and look-behinds (157 chars, requires regex lookahead support)

([-\.\w]+:\/{2,3})(?!.*[.]{2})(?![-.*\.])((?!.*@\.)[-_\w@^=%&:;~+\.]+(?<![-\.]))(\/[-_\w@^=%&$:;/~+\.]+(?<!\.))?[?]?([-_\w=&@$!|~+]+)*[#]?([-_\w=&@$!|~+]+)* 

Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)

var re = /([-\.\w]+:\/{2,3})(?!.*[.]{2})(?![-.*\.])((?!.*@\.)[-_\w@^=%&:;~+\.]+(?<![-\.]))(\/[-_\w@^=%&$:;/~+\.]+(?<!\.))?[?]?([-_\w=&@$!|~+]+)*[#]?([-_\w=&@$!|~+]+)*/gi; var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)'; var m; while ((m = re.exec(str)) !== null) { if (m.index === re.lastIndex) { re.lastIndex++; } console.log(m); } 

Will give you the following:

["https://www.facebook.com", "https://", "www.facebook.com", "", "", "" ] ["https://github.com/justsml?tab=activity#top", "https://", "github.com", "/justsml", "tab=activity", "top" ] 

12 Comments

this is a super clever way to do it +1
Your regex is not differentiating between a block of text and URL. Check here
Updated my answer - includes @noob 's suggested string prepended to my example code (so it pulls all url-like strings very reliably - even if there is a colon-prefixed string. uses explicit matching on slashes to delineate the protocol). Also works with smb:///winbox/dfs/ or ipp://printer regex101.com/r/jO8bC4/5
Thanks @Rodrigo - I updated the RegEx to pass 99% of cases. (updated link)
FYI I've cleaned up the regex patterns & fixed several minor issues. @Rodrigo 🤘 Check the new regex101 links: regex101.com/r/jO8bC4/69
|
2

You have to escape the backslash when you are using new RegExp.

Also you can put the dash - at the end of character class to avoid escaping it.

&amp; inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.

So, your regex becomes:

var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,@?^=%&:/~+#-]*[\\w@?^=%&;/~+#-])?' ); 

2 Comments

how to extend it to match more than one url?
did now work with localhost
1

try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

1 Comment

When using this I get an error Range out of order in character class"
1

I've cleaned up your regex:

var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,@\?^=%&;:/~\+#]*[a-z0-9\-@\?^=%&;/~\+#])?', 'i'); 

Tested and works just fine ;)

3 Comments

how to extend it to match more than one url? –
Add "global" modifier (g): new RegExp(.., 'gi')
Thanks for the answer. https://asdas-.com and https://-honda.com, and http://-apple-.com, \\\||||@@@@https://www.google.com and https://www...google...com and http://www.c:ool.com.au all pass according to this regex and are indeed valid URL. needs a little refinement.
1

Try this general regex for many URL format

/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+@{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%@\.\w]+)?(#[\w]+)?)?/g 

Comments

0

The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

Comments

0

try this worked for me

/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/ 

that is so simple and understandable

1 Comment

did not work in many cases. Could not comment them all here due to restrictions imposed by Stackoverflow for sharing URL :/

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.