0

I am trying to scrape the New Hampshire Secretary of State's website on registered voters. So far I have been able to get the text of the website in Beautiful soup with the following code:

import pandas as pd from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from openpyxl import Workbook import getpass from urllib.request import urlopen from bs4 import BeautifulSoup url = urlopen('http://sos.nh.gov/NamesHistory.aspx') html = BeautifulSoup(url, 'html.parser') html.find('table', attrs={'class':'table-border2-black'}).get_text 

However, my question is how would I be able to get the text from this table into a usable data frame like the one that appears on the website(http://sos.nh.gov/NamesHistory.aspx)? My question is different because this website is different from previous websites.

2
  • Possible duplicate of python BeautifulSoup parsing table Commented Jul 20, 2018 at 1:19
  • I can see how it is similar, but I would like help in getting this into a usable dataframe and this website is different from that other website. Commented Jul 20, 2018 at 1:27

1 Answer 1

0

You need to convert the scrapped data using csv files using the following commands,

import csv with open ('filename.cv','wb') as file: writer=csv.writer(file) for row in course_list: writer.writerow(row) 

you can see it here too writing and saving CSV file from scraping data using python and Beautifulsoup4.

After that you need to access the csv file and convert the data into dataframes for further processing. If you don't know how to do that, read pandas document, start here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html

Sign up to request clarification or add additional context in comments.

1 Comment

Could you post an example of this using the website that I posted? I looked at the other example and this is not a typical website. And, I am not sure how to get the text isolated from the other characteristics of the html code.