1

How do I scrape a list of items nested in a scrolldown menu?

To help contextualize, here is the chunk of the view source that I am trying to scrape from:

<!-- mp_trans_schedule_disable_start --> <select name="confirm1$ddlLeavingFromMap" onchange="javascript:setTimeout('__doPostBack(\'confirm1$ddlLeavingFromMap\',\'\')', 0)" id="confirm1_ddlLeavingFromMap" class="input"> <option selected="selected" value="-1">Select</option> <option value="429">Beamsville, ON</option> <option value="438">Belleville, ON</option> <option value="277">Brockville, ON</option> <option value="273">Buffalo Airport, NY</option> <option value="95">Buffalo, NY</option> <option value="436">Burlington, ON</option> <option value="424">Cambridge, ON</option> <option value="440">Cobourg, ON</option> <option value="278">Cornwall, ON</option> <option value="434">Fort Erie, ON</option> <option value="428">Grimsby, ON</option> <option value="426">Hamilton GO Centre, ON</option> <option value="425">Hamilton McMaster University, ON</option> <option value="276">Kingston, ON</option> <option value="279">Kirkland, PQ</option> <option value="423">Kitchener, ON</option> <option value="435">Mississauga, ON</option> <option value="280">Montreal, PQ</option> <option value="437">Napanee, ON</option> <option value="124">Niagara Falls, ON</option> <option value="449">Niagara Fallsview Casino, ON</option> <option value="431">Oakville, ON</option> <option value="433">Port Colborne, ON</option> <option value="274">Scarborough, ON</option> <option value="427">St Catharines, ON</option> <option value="448">St. Catharines Brock University, ON</option> <option value="315">TC Kingston</option> <option value="310">Toronto Airport, ON</option> <option value="145">Toronto, ON</option> <option value="439">Trenton, ON</option> <option value="422">Waterloo, ON</option> <option value="432">Welland, ON</option> <option value="275">Whitby, ON</option> </select> <!-- mp_trans_schedule_disable_end --> 

I tried to focus on the CSS selector that is responsible for choosing an option, as well as, the option tag itself: puts agent.page.parser.css("select").text & puts agent.page.parser.css("option").text but both outputs turned up nil.

I also tried:

puts agent.page.parser.css("confirm1$ddlLeavingFromMap").text and form.field_with(:name => 'confirm1$ddlLeavingFromMap').options[1].click

Which also turned up nil.

and this:

require 'htmlentities' require "mechanize" a = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' } @resultHash = {} a.get("http://ca.megabus.com/BusStops.aspx") do |page| parsedPage = page.parser @resultHash[:some_data_name] = parsedPage.at_xpath("//h3[@class='right_col']").text.split(/\s+/).join(" ") end 

However, when I check to see if it turns up valid using rake -T -A, I get undefined method text for nil:NilClass. I do not know why.

I appreciate any feedback and thanks in advance!

1
  • 2
    Great, detailed question. If you are stuck for an answer in a couple of days, ping me with @halfer and I'll add a bounty to it. Commented Dec 15, 2013 at 8:31

1 Answer 1

1

1.you should choose language first

2.you should use correct css selector (consider use plugin from selectorgadget.com)

require 'htmlentities' require "mechanize" a = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' } @resultHash = {} a.get("http://ca.megabus.com/BusStops.aspx") do |page| #you should choose language first next_page = a.submit(page.forms[0], page.forms[0].buttons.first) parsedPage = next_page.parser #you should use correct css selector @resultHash[:some_data_name] = parsedPage.at_css('#JourneyPlanner_ddlLeavingFrom').text p @resultHash[:some_data_name] end 
Sign up to request clarification or add additional context in comments.

3 Comments

Doesn't next_page choose the first language (which is English)? Also, #JourneyPlanner_ddlLeavingFrom seemed to me to be the correct css selector. Is it not?
next_page is result page after submitting form with English language. you could use more elegant way to do it. and I provided you woking piece of code, so yes #JourneyPlanner_ddlLeavingFrom is the correct css selector
Thank you for the assist! The language part was throwing me off

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.