0

Using the below code, I can't pull the College Football matchups from pregame.com in the game center.

I've tried multiple class ids with different elements, and even tried pulling with pandas, but can't get the entire table. Is there another way to scrape it successfully?

from bs4 import BeautifulSoup import lxml import requests header = {'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'} pregame = requests.get('https://pregame.com/game-center/?d=1636174800000&t=0&l=2&a=0&s=AwayRot&m=false&b=undefined&o=Current&c=All&k=', 'r').text soup = BeautifulSoup(pregame, 'lxml') div = soup.find_all('p', class_ = 'pggc-col-data pggc-away') print(div) 
2
  • 1
    what data are you after here? You don't need Selenium. Commented Nov 3, 2021 at 8:57
  • I'm looking to get all the Opening lines, Cash %, Money%, etc. from the College Football Game Center. Basically, that entire table excluding anything to the right of "Picks" Commented Nov 3, 2021 at 14:31

2 Answers 2

1

You may need to do a little data manipulation and joins depending what you are after. But you can get the data back in json format from the api and parse it.

import requests import pandas as pd url = 'https://socket.pregame.com/api/gamecenter/bootstrap' jsonData = requests.get(url).json() data = {} for each, v in jsonData.items(): data[each] = pd.DataFrame(v) for key, table in data.items(): print(f'\n*** {key} ***') print(table.head(10).to_string()) leagues_dict = dict(zip(data['Leagues']['Name'],data['Leagues']['Id'])) final_data = {} for k, v in leagues_dict.items(): events_df = data['Events'][data['Events']['LeagueId'] == v].rename(columns={'Id':'EventId'}) groups_df = data['Groups'][data['Groups']['LeagueId'] == v].rename(columns={'Id':'EventGroupId'}) odds_df = data['Odds'][data['Odds']['LeagueId'] == v] scores_df = data['Scores'] final_df = events_df.merge(groups_df.drop('LeagueId', axis=1), how='left', on='EventGroupId') final_df = final_df.merge(odds_df.drop('LeagueId', axis=1), how='right', on='EventId') if len(scores_df) > 0: final_df = final_df.merge(scores_df, how='left', on='EventId') final_data.update({k:final_df}) 

You can always just write this to csv and view in excel then too if it's easier for you to work with.

Sign up to request clarification or add additional context in comments.

8 Comments

Did you strip the whitespace somewhere? Mine doesn't look as clean as yours.
No I don’t. Not sure why yours would look any different
Hmmm very strange. Mine is all over the place. Looks like everything separated by tabs. Regardless, thank you for the help. I’ll mess around with the code you provided and try to pull only the CFB matchups
Not always. if you do .iterrows(), then yes. if you do .items(), its doing key, values of a dictionary
also fixed the issue. The scores dataframe was empty (since there no scores currently). So they must have resfreshed the data at some point between when I posted the code and when you ran it
|
1

the problem you're running into is that the data is being loaded dynamically via javascript.

You'll want to check out something like Selenium to work around this. Here's a good overview: How to Scrape Data From JavaScript-Based Website Using Python, Selenium, and Headless Web Driver

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.