0

I am trying to find all of the links in source code on a website, could anyone tell me the expression i would need to put in my Regex to find these?


Duplicate of (among others): Regular expression for parsing links from a webpage?

Google finds more: html links regex site:stackoverflow.com

1 Answer 1

-3

I'm not certain how these would translate to C# (I haven't done any development in C# myself yet), but here's how I might do it in JavaScript or ColdFusion. It might give you an idea about how you want to do it in C#.

In JavaScript I think this would work:

rex = /.*href="([^"]+)"/; a = source.replace(rex,'\n$1').split('\n'); 

after which a would be an array containing the links... though I'm not certain if that will work exactly the way I think it will. The idea here is that the replace creates a line-break-delimited list (because you can't have a line-break in a URL) and then you can break apart the list with split() to get your array.

By comparison in ColdFusion you would have to do something slightly different:

a = REMatch('href="[^"]+"',source); for (i = 1; i < ArrayLen(a); i++) { a[i] = mid(a[i],6,len(a[i])-1); } 

Again, I haven't tested it, but rematch returns an array of instances of the expression and then the for-next loop removes the href="" around the actual URL.

Sign up to request clarification or add additional context in comments.

1 Comment

The question is tagged with C#.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.