GREP rule to catch all types of web link / URL

Question

I'm looking to build a robust, reliable GREP rule to catch all web links and URLs that appears in text, covering all possible characters and gotchas like HTTPS, or URLs in brackets like (http://whatever.com), or followed by punctuation like http://whatever.com?! It's for an InDesign paragraph style GREP rule.

I've put the best I've come up with so far down below as an answer - is it missing anything, is there anything more robust or straightforward?

user56reinstatemonica8 · Accepted Answer · 2016-01-21 18:05:32Z

This seems to work pretty well:

https?\://.*?(?=(\)|\.|\,|\?|\!|"|')*($|\s))

Start with either http:// or https://
- https?\://
...then match the shortest uninterrupted string of any characters -.*?
...that is followed by, but doesn't include
- the (?= ) "positive lookahead"
...zero or more of any common punctuation - ) . , ? ! and any type of single or double quotation mark, curly or straight, left or right
- (\)|\.|\,|\?|\!|"|')*
...and then either the end of the paragraph or any type of whitespace
- ($|\s)

Some testing:

If I use this in notepad++ with the source code for this webpage, I see that you need figure out how to stop on less-than/greater-than brackets, and this will grab "http://" without an actual url. This seems to capture most instances with no false positives (just slightly greedy when there are brackets), though I haven't looked for false negatives. — Yorik
– Yorik, Commented Jan 21, 2016 at 19:10
stackoverflow.com/questions/27745/getting-parts-of-a-url-regex — Yorik
– Yorik, Commented Jan 21, 2016 at 19:14

Stack Exchange Network

GREP rule to catch all types of web link / URL

1 Answer 1

Hot Network Questions

GREP rule to catch all types of web link / URL

1 Answer 1

Related

Hot Network Questions