I'm in the process of extracting some HTML code using "Mechanize". However, I'm having a problem with the HTML code outputted. Essentially, it seems like Mechanize is replacing the content inside certain elements to '(n/a)'.
Example (structure shown in Firebug)
<tr> <td> <img class="bullet" src="images/bulletorange.gif" alt=""> <span class="detailCaption">Video Format Mode:</span> <span class="settingValue" id="vidSdSdiAnlgFormatSelectionMode.1.1">Auto</span> </td> </tr> Example (structure output by Mechanize)
<tr> <td> <img class='bullet' src='images/bulletorange.gif' alt='' /> <span class='detailCaption'>Video Format Mode:</span> <span class='settingValue' id="vidSdSdiAnlgFormatSelectionMode.1.1">(n/a)</span> </td> </tr> The problem is that "Auto" is being replaced by "(n/a)". I'm not really sure why!
Please help. Why is mechanize doing this? And how can I fix it?
Below my code...
def login_and_return_html(self, url_login, url_after_login, form_username, form_password, username, password): """ Description: Returns html code form a website that requires login. Input Arguments: url_login (str)-The url where you enter the login username and password url_after_login (str)-The url where you want to go after you login form_username (str)-The name of the form for the username input field form_password (str)-The name of the form for the password input field username (str)-The actual username password (str)- The actual password Return or Output: Returns HTML code of the 'url_after_login' page Modules and Classes: mechanize ssl """ try: # Unabling SSL certificate validation _create_unverified_https_context = ssl._create_unverified_context except AttributeError: # Legacy Python that doesn't verify HTTPS certificates by default pass else: # Handle target environment that doesn't support HTTPS verification ssl._create_default_https_context = _create_unverified_https_context br = mechanize.Browser() # Browser br.set_handle_equiv(True) # Browser options br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) cj = mechanize.CookieJar() # Cookie Jar br.set_cookiejar(cj) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) # Follows refresh 0 but not hangs on refresh > 0 br.open(url_login) # Login br.select_form(nr=0) try: br.form[form_username] = username #Fill in the blank username form br.form[form_password] = password #Fill in the blank password form br.submit() except: control = br.form.find_control(form_username) for item in control.items: #Dropdown menu username form if item.name == username: item.selected = True br.form[form_password] = password #Fill in the blank password form br.submit() html = br.open(url_after_login).read() return html