Regular Expression to find all links in webpage

Question

I am trying to find all of the links in source code on a website, could anyone tell me the expression i would need to put in my Regex to find these?

Duplicate of (among others): Regular expression for parsing links from a webpage?

Google finds more: html links regex site:stackoverflow.com

Isaac Dealey · Accepted Answer · 2009-01-19 10:04:59Z

I'm not certain how these would translate to C# (I haven't done any development in C# myself yet), but here's how I might do it in JavaScript or ColdFusion. It might give you an idea about how you want to do it in C#.

In JavaScript I think this would work:

rex = /.*href="([^"]+)"/; a = source.replace(rex,'\n$1').split('\n');

after which a would be an array containing the links... though I'm not certain if that will work exactly the way I think it will. The idea here is that the replace creates a line-break-delimited list (because you can't have a line-break in a URL) and then you can break apart the list with split() to get your array.

By comparison in ColdFusion you would have to do something slightly different:

a = REMatch('href="[^"]+"',source); for (i = 1; i < ArrayLen(a); i++) { a[i] = mid(a[i],6,len(a[i])-1); }

Again, I haven't tested it, but rematch returns an array of instances of the expression and then the for-next loop removes the href="" around the actual URL.

Collectives™ on Stack Overflow

Regular Expression to find all links in webpage

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related