I'm using selenium with python, and I'm trying to scrape this page. https://www.vexforum.com/u?period=all. I want to be able to get the data for all 40,000 or so users on this forum, but it only loads 50 initially. You can keep scrolling on the page to load all of the forum's members. Is there any way to request the entire page initially, with all 40k members? Thanks for any help you can provide!
1 Answer
You should use requests (if the robots.txt allow that):
import requests count = 2 while True: try: headers = { 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Cookie': '_ga=GA1.2.439277064.1611329580; _gat=1; _gid=GA1.2.1557861689.1611329580', 'Referer': 'https://www.vexforum.com/u?period=all', 'Host': 'www.vexforum.com', 'Accept-Language': 'it-it', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15', 'Accept-Encoding': 'gzip, deflate, br', 'Connection': 'keep-alive', 'X-CSRF-Token': 'undefined', 'Discourse-Present': 'true', 'X-Requested-With': 'XMLHttpRequest', } params = { 'order': 'likes_received', 'page': str(count), 'period': 'all' } r = requests.get('https://www.vexforum.com/directory_items?order=likes_received&page=2&period=all', headers=headers, params=params) print(r.json()) print('\n\n\n') print('___________________________________________________') print('\n\n\n') count +=1 except: pass You now have only to parse the json response grab the information you want.