2

I'm trying to get HTML code from a specific webpage, but when I do it using

 HttpWebRequest request; HttpWebResponse response; StreamReader streamReader; request = (HttpWebRequest)WebRequest.Create(pageURL); response = (HttpWebResponse)request.GetResponse(); streamReader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("windows-1251")); htmlCode = streamReader.ReadToEnd(); streamReader.Close(); 

or using WebClient, I get redirected to a login page and I get its code. Is there any other way to get HTML code?

I read some information here: How to get HTML from a current request, in a postback , but didn't understand what should I do, or how and where to specify URL.

P.S.: I'm logged-in in a browser. Notepad++ perfectly gets what I need via "right click - view source code".

Thanks.

1
  • 1
    Sounds like the page expect you to be in a login session to access the page. You'll have to mimic the login first to get the session (Cookies most likely - Use a CookieContainer for this) and then access the page. Commented Oct 23, 2012 at 13:36

3 Answers 3

2

If you get redirected to a login page, then presumably you must be logged in before you can get the content.

So you need to make a request, with suitable credentials, to the login page. Get whatever tokens are sent (usually in the form of cookies) to maintain the login. Then request the page you want (sending the cookies with the request).

Alternatively (and this is the preferred approach), most major sites that expect automated systems to interact with them provide an API (often using OAuth for authentication). Consult their documentation to see how their API works.

Sign up to request clarification or add additional context in comments.

Comments

1

If the page you want to get to is behind a login screen - you're going to need to do the login mechanism through code. And add an associated CookieCollection to hold the login cookie that the website will try to drop on your Request.

Alternatively, if you have a user who can help the program along, you could try listing the cookies for the site once they've logged in through their browser. Copy that cookie across and add it to the CookieCollection.

Cheers Simon

Comments

0

If you want to scrap an html page that requires autentication, I suggest you to use Watin to fill the proper fields and navigate to the pages you want to download. Maybe iot seems a little overkilling at a first glance, but it will save a lot of troubles later.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.