Slightly different approach than the BeautifulSoup version below to give you options.
I like BeautifulSoup to parse, until I see <table> tags. Then I usually just go to Pandas to get the table as it can be done in 1 line, then I can just manipulate the dataframe as needed.
Then can just convert the dataframe to json (actually learned this from an ewwink solution a few weeks back :-) )
import pandas as pd import requests import json url = 'https://bgp.he.net/country/US' session = requests.Session() headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36", "Accept-Encoding": "gzip, deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Language": "en"} response = session.get(url, headers=headers) tables = pd.read_html(response.text) table = tables[0] table['Country'] = url.split('/')[-1] jsonObject = table.to_dict(orient='records') # if you need as string to write to json file jsonObject_string = json.dumps(jsonObject)
Output:
[{'ASN': 'AS6939', 'Name': 'Hurricane Electric LLC', 'Adjacencies v4': 7216, 'Routes v4': 127337, 'Adjacencies v6': 4460, 'Routes v6': 28227, 'Country': 'US'}, {'ASN': 'AS174', 'Name': 'Cogent Communications', 'Adjacencies v4': 5692, 'Routes v4': 118159, 'Adjacencies v6': 1914, 'Routes v6': 8814, 'Country': 'US'}...
<Response [200]>? Seems like for me Im getting<Response [404]>