Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
deleted 4882 characters in body
Source Link

update

In my effort to figure this out I did some more thinking and diggin. I hope sharing more will help start a dialog with someone. This is what I discovered :

In order to get the lay of the land, I visited http://ca.megabus.com/BusStops.aspx and viewed all the GET requests in the network view. I then clicked the drop down menu and chose a random origin and destination to generate a POST request. I did not click search though. From there, I opened up the POST generated BusStops.aspx file in the left sidebar.

Inside it, I focused on the event target in the header which is :

__EVENTTARGET:confirm1$ddlTravellingTo 

and the view state which is a really long string of randomly generated letters and numbers. I assume this is because the fields have the value of hidden. I also noticed it has this value in the header :

X-MicrosoftAjax:Delta=true 

which I saw on Github. lawnjam has a gist of a scrape for the megabus UK site using Python :

https://github.com/lawnjam/megabus-scraper/blob/master/megabus-routes.py

megasoup seems to be a Python version of nokogiri (I think) and I also believe Nokogiri was built off of it. Atappreciate any rate, there is that and urllib2 seems to be a library of commands for working with scrape data I think (http://docs.python.org/3/library/urllib.html). I am 90ish percent sure mechanize gives me all of that, especially since that gist is 3 years old.

Ok, back to the matter at hand. From what I can decipher in that code, it looks like lawnjam pulls all the data fields manually and set them to new local variables. Take the headers and values for example :

headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-gb,en;q=0.8,en-us;q=0.5,gd;q=0.3', 'Accept-Encoding': 'gzip,deflate', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'} # set other form values values = { 'Welcome1_ScriptManager1_HiddenField': '', 'Welcome1$ScriptManager1': 'SearchAndBuy1$upSearchAndBuy|SearchAndBuy1$ddlLeavingFrom', '__EVENTTARGET': 'SearchAndBuy1$ddlLeavingFrom', '__EVENTARGUMENT': '', 'Welcome1$hdnBasketItemCount': '0', 'Language1$ddlLanguage': 'en', 'SearchAndBuy1$txtPassengers': '1', 'SearchAndBuy1$txtConcessions': '0', 'SearchAndBuy1$txtNUSExtra': '0', 'SearchAndBuy1$txtOutboundDate': '', 'SearchAndBuy1$txtReturnDate': '', 'SearchAndBuy1$txtPromotionalCode': '', '__ASYNCPOST': 'true' } headers['X-MicrosoftAjax'] = 'Delta=true’ 

From there though, it gets hazy. To elaborate, in this next section of code, it looks like he is taking the valuesfeedback and assigning them local variables again but I am not sure how to approach making a loop like that in Ruby or if I can even do that. The urlib2 is throwing me off. :

for a in startLocations: values['SearchAndBuy1$ddlLeavingFrom'] = a values['__EVENTVALIDATION'] = eventvalidation values['__VIEWSTATE'] = viewstate data = urllib.urlencode(values) req = urllib2.Request('http://uk.megabus.com/default.aspx', data, headers) 

Next, I think he is referring to

UserStatus$ScriptManager1:confirm1$UpdatePanel1|confirm1$ddlTravellingTo 

from the form data section of the POST file’s headerthanks in the inspector network tab when he coded this part :

# store the received (pipe-separated) data in a list L = urllib2.urlopen(req).read().split('|’) 

Now this is where I fall further down the rabbit hole. I can figure out that this next loop is just iterating through each location one at a time but I do not know what position is and where is it defined. The Python style might be throwing me off here :

for position, item in enumerate(L): if item == 'SearchAndBuy1_upSearchAndBuy': html = L[position + 1] if item == '__VIEWSTATE': viewstate = L[position + 1] # save __VIEWSTATE for the next iteration if item == '__EVENTVALIDATION': eventvalidation = L[position + 1] # save __EVENTVALIDATION for the next iteration 

This next part seems to be where the list of stops gets populated but beautiful soup is throwing me off. Is it analogous to :

agent = Mechanize.new options = agent.find(name=….. 

?

megaSoup = BeautifulSoup(html) options = megaSoup.find(name='select', attrs={'name': 'SearchAndBuy1$ddlTravellingTo'}).findAll('option') endLocations = {} for o in options: if int(o['value']) > 0: print '"' + startLocations[a] + '","' + o.find(text=True) + '"' #endLocations[int(o['value'])] = o.find(text=True 

I would be appreciative of any feedback.advance!

update

In my effort to figure this out I did some more thinking and diggin. I hope sharing more will help start a dialog with someone. This is what I discovered :

In order to get the lay of the land, I visited http://ca.megabus.com/BusStops.aspx and viewed all the GET requests in the network view. I then clicked the drop down menu and chose a random origin and destination to generate a POST request. I did not click search though. From there, I opened up the POST generated BusStops.aspx file in the left sidebar.

Inside it, I focused on the event target in the header which is :

__EVENTTARGET:confirm1$ddlTravellingTo 

and the view state which is a really long string of randomly generated letters and numbers. I assume this is because the fields have the value of hidden. I also noticed it has this value in the header :

X-MicrosoftAjax:Delta=true 

which I saw on Github. lawnjam has a gist of a scrape for the megabus UK site using Python :

https://github.com/lawnjam/megabus-scraper/blob/master/megabus-routes.py

megasoup seems to be a Python version of nokogiri (I think) and I also believe Nokogiri was built off of it. At any rate, there is that and urllib2 seems to be a library of commands for working with scrape data I think (http://docs.python.org/3/library/urllib.html). I am 90ish percent sure mechanize gives me all of that, especially since that gist is 3 years old.

Ok, back to the matter at hand. From what I can decipher in that code, it looks like lawnjam pulls all the data fields manually and set them to new local variables. Take the headers and values for example :

headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-gb,en;q=0.8,en-us;q=0.5,gd;q=0.3', 'Accept-Encoding': 'gzip,deflate', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'} # set other form values values = { 'Welcome1_ScriptManager1_HiddenField': '', 'Welcome1$ScriptManager1': 'SearchAndBuy1$upSearchAndBuy|SearchAndBuy1$ddlLeavingFrom', '__EVENTTARGET': 'SearchAndBuy1$ddlLeavingFrom', '__EVENTARGUMENT': '', 'Welcome1$hdnBasketItemCount': '0', 'Language1$ddlLanguage': 'en', 'SearchAndBuy1$txtPassengers': '1', 'SearchAndBuy1$txtConcessions': '0', 'SearchAndBuy1$txtNUSExtra': '0', 'SearchAndBuy1$txtOutboundDate': '', 'SearchAndBuy1$txtReturnDate': '', 'SearchAndBuy1$txtPromotionalCode': '', '__ASYNCPOST': 'true' } headers['X-MicrosoftAjax'] = 'Delta=true’ 

From there though, it gets hazy. To elaborate, in this next section of code, it looks like he is taking the values and assigning them local variables again but I am not sure how to approach making a loop like that in Ruby or if I can even do that. The urlib2 is throwing me off. :

for a in startLocations: values['SearchAndBuy1$ddlLeavingFrom'] = a values['__EVENTVALIDATION'] = eventvalidation values['__VIEWSTATE'] = viewstate data = urllib.urlencode(values) req = urllib2.Request('http://uk.megabus.com/default.aspx', data, headers) 

Next, I think he is referring to

UserStatus$ScriptManager1:confirm1$UpdatePanel1|confirm1$ddlTravellingTo 

from the form data section of the POST file’s header in the inspector network tab when he coded this part :

# store the received (pipe-separated) data in a list L = urllib2.urlopen(req).read().split('|’) 

Now this is where I fall further down the rabbit hole. I can figure out that this next loop is just iterating through each location one at a time but I do not know what position is and where is it defined. The Python style might be throwing me off here :

for position, item in enumerate(L): if item == 'SearchAndBuy1_upSearchAndBuy': html = L[position + 1] if item == '__VIEWSTATE': viewstate = L[position + 1] # save __VIEWSTATE for the next iteration if item == '__EVENTVALIDATION': eventvalidation = L[position + 1] # save __EVENTVALIDATION for the next iteration 

This next part seems to be where the list of stops gets populated but beautiful soup is throwing me off. Is it analogous to :

agent = Mechanize.new options = agent.find(name=….. 

?

megaSoup = BeautifulSoup(html) options = megaSoup.find(name='select', attrs={'name': 'SearchAndBuy1$ddlTravellingTo'}).findAll('option') endLocations = {} for o in options: if int(o['value']) > 0: print '"' + startLocations[a] + '","' + o.find(text=True) + '"' #endLocations[int(o['value'])] = o.find(text=True 

I would be appreciative of any feedback.

I appreciate any feedback and thanks in advance!

edited title
Link

Can't get mechanize to scrape multiple items, getting "undefined method `text' for nil:NilClass" Scrape cannot find nested content using the Mechanize gem

Added more elaboration and useful queries
Source Link

update

In my effort to figure this out I did some more thinking and diggin. I hope sharing more will help start a dialog with someone. This is what I discovered :

In order to get the lay of the land, I visited http://ca.megabus.com/BusStops.aspx and viewed all the GET requests in the network view. I then clicked the drop down menu and chose a random origin and destination to generate a POST request. I did not click search though. From there, I opened up the POST generated BusStops.aspx file in the left sidebar.

Inside it, I focused on the event target in the header which is :

__EVENTTARGET:confirm1$ddlTravellingTo 

and the view state which is a really long string of randomly generated letters and numbers. I assume this is because the fields have the value of hidden. I also noticed it has this value in the header :

X-MicrosoftAjax:Delta=true 

which I saw on Github. lawnjam has a gist of a scrape for the megabus UK site using Python :

https://github.com/lawnjam/megabus-scraper/blob/master/megabus-routes.py

megasoup seems to be a Python version of nokogiri (I think) and I also believe Nokogiri was built off of it. At any rate, there is that and urllib2 seems to be a library of commands for working with scrape data I think (http://docs.python.org/3/library/urllib.html). I am 90ish percent sure mechanize gives me all of that, especially since that gist is 3 years old.

Ok, back to the matter at hand. From what I can decipher in that code, it looks like lawnjam pulls all the data fields manually and set them to new local variables. Take the headers and values for example :

headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-gb,en;q=0.8,en-us;q=0.5,gd;q=0.3', 'Accept-Encoding': 'gzip,deflate', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'} # set other form values values = { 'Welcome1_ScriptManager1_HiddenField': '', 'Welcome1$ScriptManager1': 'SearchAndBuy1$upSearchAndBuy|SearchAndBuy1$ddlLeavingFrom', '__EVENTTARGET': 'SearchAndBuy1$ddlLeavingFrom', '__EVENTARGUMENT': '', 'Welcome1$hdnBasketItemCount': '0', 'Language1$ddlLanguage': 'en', 'SearchAndBuy1$txtPassengers': '1', 'SearchAndBuy1$txtConcessions': '0', 'SearchAndBuy1$txtNUSExtra': '0', 'SearchAndBuy1$txtOutboundDate': '', 'SearchAndBuy1$txtReturnDate': '', 'SearchAndBuy1$txtPromotionalCode': '', '__ASYNCPOST': 'true' } headers['X-MicrosoftAjax'] = 'Delta=true’ 

From there though, it gets hazy. To elaborate, in this next section of code, it looks like he is taking the values and assigning them local variables again but I am not sure how to approach making a loop like that in Ruby or if I can even do that. The urlib2 is throwing me off. :

for a in startLocations: values['SearchAndBuy1$ddlLeavingFrom'] = a values['__EVENTVALIDATION'] = eventvalidation values['__VIEWSTATE'] = viewstate data = urllib.urlencode(values) req = urllib2.Request('http://uk.megabus.com/default.aspx', data, headers) 

Next, I think he is referring to

UserStatus$ScriptManager1:confirm1$UpdatePanel1|confirm1$ddlTravellingTo 

from the form data section of the POST file’s header in the inspector network tab when he coded this part :

# store the received (pipe-separated) data in a list L = urllib2.urlopen(req).read().split('|’) 

Now this is where I fall further down the rabbit hole. I can figure out that this next loop is just iterating through each location one at a time but I do not know what position is and where is it defined. The Python style might be throwing me off here :

for position, item in enumerate(L): if item == 'SearchAndBuy1_upSearchAndBuy': html = L[position + 1] if item == '__VIEWSTATE': viewstate = L[position + 1] # save __VIEWSTATE for the next iteration if item == '__EVENTVALIDATION': eventvalidation = L[position + 1] # save __EVENTVALIDATION for the next iteration 

This next part seems to be where the list of stops gets populated but beautiful soup is throwing me off. Is it analogous to :

agent = Mechanize.new options = agent.find(name=….. 

?

megaSoup = BeautifulSoup(html) options = megaSoup.find(name='select', attrs={'name': 'SearchAndBuy1$ddlTravellingTo'}).findAll('option') endLocations = {} for o in options: if int(o['value']) > 0: print '"' + startLocations[a] + '","' + o.find(text=True) + '"' #endLocations[int(o['value'])] = o.find(text=True 

I would be appreciative of any feedback.

I would be appreciative of any feedback.

update

In my effort to figure this out I did some more thinking and diggin. I hope sharing more will help start a dialog with someone. This is what I discovered :

In order to get the lay of the land, I visited http://ca.megabus.com/BusStops.aspx and viewed all the GET requests in the network view. I then clicked the drop down menu and chose a random origin and destination to generate a POST request. I did not click search though. From there, I opened up the POST generated BusStops.aspx file in the left sidebar.

Inside it, I focused on the event target in the header which is :

__EVENTTARGET:confirm1$ddlTravellingTo 

and the view state which is a really long string of randomly generated letters and numbers. I assume this is because the fields have the value of hidden. I also noticed it has this value in the header :

X-MicrosoftAjax:Delta=true 

which I saw on Github. lawnjam has a gist of a scrape for the megabus UK site using Python :

https://github.com/lawnjam/megabus-scraper/blob/master/megabus-routes.py

megasoup seems to be a Python version of nokogiri (I think) and I also believe Nokogiri was built off of it. At any rate, there is that and urllib2 seems to be a library of commands for working with scrape data I think (http://docs.python.org/3/library/urllib.html). I am 90ish percent sure mechanize gives me all of that, especially since that gist is 3 years old.

Ok, back to the matter at hand. From what I can decipher in that code, it looks like lawnjam pulls all the data fields manually and set them to new local variables. Take the headers and values for example :

headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-gb,en;q=0.8,en-us;q=0.5,gd;q=0.3', 'Accept-Encoding': 'gzip,deflate', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'} # set other form values values = { 'Welcome1_ScriptManager1_HiddenField': '', 'Welcome1$ScriptManager1': 'SearchAndBuy1$upSearchAndBuy|SearchAndBuy1$ddlLeavingFrom', '__EVENTTARGET': 'SearchAndBuy1$ddlLeavingFrom', '__EVENTARGUMENT': '', 'Welcome1$hdnBasketItemCount': '0', 'Language1$ddlLanguage': 'en', 'SearchAndBuy1$txtPassengers': '1', 'SearchAndBuy1$txtConcessions': '0', 'SearchAndBuy1$txtNUSExtra': '0', 'SearchAndBuy1$txtOutboundDate': '', 'SearchAndBuy1$txtReturnDate': '', 'SearchAndBuy1$txtPromotionalCode': '', '__ASYNCPOST': 'true' } headers['X-MicrosoftAjax'] = 'Delta=true’ 

From there though, it gets hazy. To elaborate, in this next section of code, it looks like he is taking the values and assigning them local variables again but I am not sure how to approach making a loop like that in Ruby or if I can even do that. The urlib2 is throwing me off. :

for a in startLocations: values['SearchAndBuy1$ddlLeavingFrom'] = a values['__EVENTVALIDATION'] = eventvalidation values['__VIEWSTATE'] = viewstate data = urllib.urlencode(values) req = urllib2.Request('http://uk.megabus.com/default.aspx', data, headers) 

Next, I think he is referring to

UserStatus$ScriptManager1:confirm1$UpdatePanel1|confirm1$ddlTravellingTo 

from the form data section of the POST file’s header in the inspector network tab when he coded this part :

# store the received (pipe-separated) data in a list L = urllib2.urlopen(req).read().split('|’) 

Now this is where I fall further down the rabbit hole. I can figure out that this next loop is just iterating through each location one at a time but I do not know what position is and where is it defined. The Python style might be throwing me off here :

for position, item in enumerate(L): if item == 'SearchAndBuy1_upSearchAndBuy': html = L[position + 1] if item == '__VIEWSTATE': viewstate = L[position + 1] # save __VIEWSTATE for the next iteration if item == '__EVENTVALIDATION': eventvalidation = L[position + 1] # save __EVENTVALIDATION for the next iteration 

This next part seems to be where the list of stops gets populated but beautiful soup is throwing me off. Is it analogous to :

agent = Mechanize.new options = agent.find(name=….. 

?

megaSoup = BeautifulSoup(html) options = megaSoup.find(name='select', attrs={'name': 'SearchAndBuy1$ddlTravellingTo'}).findAll('option') endLocations = {} for o in options: if int(o['value']) > 0: print '"' + startLocations[a] + '","' + o.find(text=True) + '"' #endLocations[int(o['value'])] = o.find(text=True 

I would be appreciative of any feedback.

Tidy, fix punctuation prefixes, trim
Source Link
halfer
  • 20.2k
  • 20
  • 110
  • 207
Loading
added 29 characters in body
Source Link
Loading
Source Link
Loading