I am trying to read a table from a web-page. Generally, my company has strict authentication policies restricting us in the way we can scrape the data. But the following code is how I am trying to use to do the same
from urllib.request import urlopen from requests_kerberos import HTTPKerberosAuth, OPTIONAL import os import lxml.html as LH import requests import pandas as pd cert = r"C:\\Users\\name\\Desktop\\cacert.pem" os.environ["REQUESTS_CA_BUNDLE"] = cert kerberos = HTTPKerberosAuth(mutual_authentication=OPTIONAL) session = requests.Session() link = 'weblink' data=session.get(link,auth=kerberos,verify=False).content.decode("latin-1") And that leaves me with the entire HTML of the webpage in "data". How do I convert this into a dataframe?
Note : I couldn't provide the weblink due to privacy concerns.. I was just wondering if there was a general way which I can use to tackle this situation.
pandas.read_htmlif there are tables, they can be read directly into pandas.