2

I'm trying to use mechanize to grab prices for New York's metro-north railroad from this site:
http://as0.mta.info/mnr/fares/choosestation.cfm

The problem is that when you select the first option, the site uses javascript to populate your list of possible destinations. I have written equivalent code in python, but I can't seem to get it all working. Here's what I have so far:

import mechanize import cookielib from bs4 import BeautifulSoup br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open("http://as0.mta.info/mnr/fares/choosestation.cfm") br.select_form(name="form1") br.form.set_all_readonly(False) origin_control = br.form.find_control("orig_stat", type="select") origin_control_list = origin_control.items origin_control.value = [origin_control.items[0].name] destination_control_list = reFillList(0, origin_control_list) destination_control = br.form.find_control("dest_stat", type="select") destination_control.items = destination_control_list destination_control.value = [destination_control.items[0].name] response = br.submit() response_text = response.read() print response_text 

I know I didn't give you code for the reFillList() method, because it's long, but assume it correctly creates a list of mechanize.option objects. Python doesn't complain about me about anything, but on submit I get the html for this alert:

"Fare information for travel between two lines is not available on-line. Please contact our Customer Information Center at 511 and ask to speak to a representative for further information."

Am I missing something here? Thanks for all the help!

1 Answer 1

2

If you know the station IDs, it is easier to POST the request yourself:

import mechanize import urllib post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm' orig = 295 #BEACON FALLS dest = 292 #ANSONIA params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig }) rq = mechanize.Request(post_url, params) fares_page = mechanize.urlopen(rq) print fares_page.read() 

If you have the code to find the list of destination IDs for a given starting ID (i.e. a variant of refillList()), you can then run this request for each combination:

import mechanize import urllib, urllib2 from bs4 import BeautifulSoup url = 'http://as0.mta.info/mnr/fares/choosestation.cfm' post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm' def get_fares(orig, dest): params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig }) rq = mechanize.Request(post_url, params) fares_page = mechanize.urlopen(rq) print(fares_page.read()) pool = BeautifulSoup(urllib2.urlopen(url).read()) #let's keep our stations organised stations = {} # dict by station id for option in pool.find('select', {'name':'orig_stat'}).findChildren(): stations[option['value']] = {'name':option.string} #iterate over all routes for origin in stations: destinations = get_list_of_dests(origin) #use your code for this stations[origin]['dests'] = destinations for destination in destinations: print('Processing from %s to %s' % (origin, destination)) get_fares(origin, destination) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.