Grabbing content from a website in C#

Question

New to C# here, but I've used Java for years. I tried googling this and got a couple of answers that were not quite what I need. I'd like to grab the (X)HTML from a website and then use DOM (actually, CSS selectors are preferable, but whatever works) to grab a particular element. How exactly is this done in C#?

Could you add some example code for us to work with?

user153923
– user153923

2011-06-29 14:14:23 +00:00
Commented Jun 29, 2011 at 14:14 — user153923
– user153923, Commented Jun 29, 2011 at 14:14
It's too bad comments can't be downvoted.

Doug S
– Doug S

2012-11-17 13:02:08 +00:00
Commented Nov 17, 2012 at 13:02 — Doug S
– Doug S, Commented Nov 17, 2012 at 13:02

Maxim · Accepted Answer · 2011-06-29 14:16:53Z

2

To get the HTML you can use the WebClient object.

To parse the HTML you can use HTMLAgility librrary.

answered Jun 29, 2011 at 14:16

Maxim

7,3681 gold badge34 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jaywayco · Accepted Answer · 2011-06-29 14:19:01Z

// prepare the web page we will be asking for HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://www.stackoverflow.com"); // execute the request HttpWebResponse response = (HttpWebResponse)request.GetResponse(); // we will read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { // fill the buffer with data count = resStream.Read(buf, 0, buf.Length); // make sure we read some data if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); // any more data to read?

Then use Xquery expressions or Regex to grab the element you need

carla · Accepted Answer · 2017-12-15 12:47:39Z

You could use System.Net.WebClient or System.Net.HttpWebrequest to fetch the page but parsing for the elements is not supported by the classes.

Use HtmlAgilityPack (http://html-agility-pack.net/)

HtmlWeb htmlWeb = new HtmlWeb(); htmlWeb.UseCookies = true; HtmlDocument htmlDocument = htmlWeb.Load(url); // after getting the document node // you can do something like this foreach (HtmlNode item in htmlDocument.DocumentNode.Descendants("input")) { // item mathces your req // take the item. }

Daren Thomas · Accepted Answer · 2011-06-29 14:15:11Z

I hear you want to use the HtmlAgilityPack for working with HTML files. This will give you Linq access, with is A Good Thing (tm). You can download the file with System.Net.WebClient.

Giorgi · Accepted Answer · 2011-06-29 14:16:00Z

0

You can use Html Agility Pack to load html and find the element you need.

answered Jun 29, 2011 at 14:16

Giorgi

31k13 gold badges92 silver badges128 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:00:59Z

To get you started, you can fairly easily use HttpWebRequest to get the contents of a URL. From there, you will have to do something to parse out the HTML. That is where it starts to get tricky. You can't use a normal XML parser, because many (most?) web site HTML pages aren't 100% valid XML. Web browsers have specially implemented parsers to work around the invalid portions. In Ruby, I would use something like Nokogiri to parse the HTML, so you might want to look for a .NET port of it, or another parser specificly designed to read HTML.

Edit:

Since the topic is likely to come up: WebClient vs. HttpWebRequest/HttpWebResponse

Also, thanks to the others that answered for noting HtmlAgility. I didn't know it existed.

Tija · Accepted Answer · 2011-06-29 14:17:21Z

Look into using the html agility pack, which is one of the more common libraries for parsing html.

http://htmlagilitypack.codeplex.com/

Collectives™ on Stack Overflow

Grabbing content from a website in C#

7 Answers 7

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related