1

I'm trying to scrape some schedules off of a website. the information is displayed in a GridView with paging.

The url is: http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0

My Issue is when I want to scrape pages other then #1 in the grid view. The best post I found so far was This One, but it doesn't work and that topic is not complete. I tried to use Fiddler and Chrome to get the post data and use it, but I can't get it to work for me. Can you guys see what's missing?

Here's the code I am using. it's in VB, but you can answer in C# and I'll translate -) (sorry)

 Protected Sub Page_Load(sender As Object, e As System.EventArgs) Handles Me.Load Dim lcUrl As String = "http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0" ' first, request the login form to get the viewstate value Dim webRequest__1 As HttpWebRequest = TryCast(WebRequest.Create(lcUrl), HttpWebRequest) Dim responseReader As New StreamReader(webRequest__1.GetResponse().GetResponseStream()) Dim responseData As String = responseReader.ReadToEnd() responseReader.Close() ' extract the viewstate value and build out POST data Dim viewState As String = ExtractViewState(responseData) Dim loHttp As HttpWebRequest = DirectCast(WebRequest.Create(lcUrl), HttpWebRequest) ' *** Send any POST data Dim lcPostData As String = [String].Format("__VIEWSTATE={0}&__EVENTTARGET={1}&__EVENTARGUMENT={2}", viewState, HttpUtility.UrlEncode("contentwrapper_0$maincontent_0$maincontentfullwidth_0$ucSearchResults$gvPrograms"), HttpUtility.UrlEncode("Page$3")) loHttp.Method = "POST" Dim lbPostBuffer As Byte() = System.Text.Encoding.GetEncoding(1252).GetBytes(lcPostData) loHttp.ContentLength = lbPostBuffer.Length Dim loPostData As Stream = loHttp.GetRequestStream() loPostData.Write(lbPostBuffer, 0, lbPostBuffer.Length) loPostData.Close() Dim loWebResponse As HttpWebResponse = DirectCast(loHttp.GetResponse(), HttpWebResponse) Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252) Dim loResponseStream As New StreamReader(loWebResponse.GetResponseStream(), enc) Dim lcHtml As String = loResponseStream.ReadToEnd() loWebResponse.Close() loResponseStream.Close() Response.Write(lcHtml) End Sub Private Function ExtractViewState(s As String) As String Dim viewStateNameDelimiter As String = "__VIEWSTATE" Dim valueDelimiter As String = "value=""" Dim viewStateNamePosition As Integer = s.IndexOf(viewStateNameDelimiter) Dim viewStateValuePosition As Integer = s.IndexOf(valueDelimiter, viewStateNamePosition) Dim viewStateStartPosition As Integer = viewStateValuePosition + valueDelimiter.Length Dim viewStateEndPosition As Integer = s.IndexOf("""", viewStateStartPosition) Return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition)) End Function 
1
  • Did you find solution? Commented Sep 29, 2020 at 10:04

1 Answer 1

-1

To make it work you need to send all input fields to the page, not only viewstate. Other critical data is the __EVENTVALIDATION for example that you do not handle it. So:

First you need to make scrape on the #1 page. So load it and use the Html Agility Pack to convert it to a usable struct.

Then extract from that struct the input data that you need to post. From this answer HTML Agility Pack get all input fields here is a code sniped on how you can do that.

foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input")) { // use this to create the post string // input.Attributes["value"]; } 

Then when you have the post data that is needed to be a valid post, you move to the next step. Here is an example How to pass POST parameters to ASP.Net web request?

You can also read: How to use HTML Agility pack

Sign up to request clarification or add additional context in comments.

1 Comment

The problem is specific in pagination scraping

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.