How do I pull the table data from this website?

Question

Using the below code, I can't pull the College Football matchups from pregame.com in the game center.

I've tried multiple class ids with different elements, and even tried pulling with pandas, but can't get the entire table. Is there another way to scrape it successfully?

from bs4 import BeautifulSoup import lxml import requests header = {'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'} pregame = requests.get('https://pregame.com/game-center/?d=1636174800000&t=0&l=2&a=0&s=AwayRot&m=false&b=undefined&o=Current&c=All&k=', 'r').text soup = BeautifulSoup(pregame, 'lxml') div = soup.find_all('p', class_ = 'pggc-col-data pggc-away') print(div)

I'm looking to get all the Opening lines, Cash %, Money%, etc. from the College Football Game Center. Basically, that entire table excluding anything to the right of "Picks" — RookiePython
– RookiePython, Commented Nov 3, 2021 at 14:31

chitown88 · Accepted Answer · 2021-11-04 14:57:52Z

You may need to do a little data manipulation and joins depending what you are after. But you can get the data back in json format from the api and parse it.

import requests import pandas as pd url = 'https://socket.pregame.com/api/gamecenter/bootstrap' jsonData = requests.get(url).json() data = {} for each, v in jsonData.items(): data[each] = pd.DataFrame(v) for key, table in data.items(): print(f'\n*** {key} ***') print(table.head(10).to_string()) leagues_dict = dict(zip(data['Leagues']['Name'],data['Leagues']['Id'])) final_data = {} for k, v in leagues_dict.items(): events_df = data['Events'][data['Events']['LeagueId'] == v].rename(columns={'Id':'EventId'}) groups_df = data['Groups'][data['Groups']['LeagueId'] == v].rename(columns={'Id':'EventGroupId'}) odds_df = data['Odds'][data['Odds']['LeagueId'] == v] scores_df = data['Scores'] final_df = events_df.merge(groups_df.drop('LeagueId', axis=1), how='left', on='EventGroupId') final_df = final_df.merge(odds_df.drop('LeagueId', axis=1), how='right', on='EventId') if len(scores_df) > 0: final_df = final_df.merge(scores_df, how='left', on='EventId') final_data.update({k:final_df})

You can always just write this to csv and view in excel then too if it's easier for you to work with.

Did you strip the whitespace somewhere? Mine doesn't look as clean as yours.
Hmmm very strange. Mine is all over the place. Looks like everything separated by tabs. Regardless, thank you for the help. I’ll mess around with the code you provided and try to pull only the CFB matchups
Not always. if you do .iterrows(), then yes. if you do .items(), its doing key, values of a dictionary
also fixed the issue. The scores dataframe was empty (since there no scores currently). So they must have resfreshed the data at some point between when I posted the code and when you ran it

DisappointedByUnaccountableMod · Accepted Answer · 2021-11-03 21:15:36Z

the problem you're running into is that the data is being loaded dynamically via javascript.

You'll want to check out something like Selenium to work around this. Here's a good overview: How to Scrape Data From JavaScript-Based Website Using Python, Selenium, and Headless Web Driver

Collectives™ on Stack Overflow

How do I pull the table data from this website?

2 Answers 2

8 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Related