Create a Dataframe from HTML

Question

I am trying to read a table from a web-page. Generally, my company has strict authentication policies restricting us in the way we can scrape the data. But the following code is how I am trying to use to do the same

from urllib.request import urlopen from requests_kerberos import HTTPKerberosAuth, OPTIONAL import os import lxml.html as LH import requests import pandas as pd cert = r"C:\\Users\\name\\Desktop\\cacert.pem" os.environ["REQUESTS_CA_BUNDLE"] = cert kerberos = HTTPKerberosAuth(mutual_authentication=OPTIONAL) session = requests.Session() link = 'weblink' data=session.get(link,auth=kerberos,verify=False).content.decode("latin-1")

And that leaves me with the entire HTML of the webpage in "data". How do I convert this into a dataframe?

Note : I couldn't provide the weblink due to privacy concerns.. I was just wondering if there was a general way which I can use to tackle this situation.

I was just wondering if there was a procedure to convert the HTML into a dataframe. That's what the question is about — jack ryan
– jack ryan, Commented Oct 21, 2019 at 4:06
pandas.read_html if there are tables, they can be read directly into pandas. — Trenton McKinney
– Trenton McKinney, Commented Oct 21, 2019 at 5:06

caxcaxcoatl · Accepted Answer · 2019-10-21 05:10:21Z

It looks like you're looking for something like this, using Beautifulsoup?

From there, you'll have to create the data frame itself, but you will have passed the 'procedure to convert the HTML into' a data structure step. (that is, read the HTML table into a list or dictionary, and then transform it into a dataframe)

Edit 1

Actually, you can use Pandas' read_html. You might need Beautifulsoup still to get exactly what you want, but depending on how the source HTML looks like, it might be enough alone.

Collectives™ on Stack Overflow

Create a Dataframe from HTML

1 Answer 1

Edit 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit 1

Comments

Linked

Related