1

I'm using selenium with python, and I'm trying to scrape this page. https://www.vexforum.com/u?period=all. I want to be able to get the data for all 40,000 or so users on this forum, but it only loads 50 initially. You can keep scrolling on the page to load all of the forum's members. Is there any way to request the entire page initially, with all 40k members? Thanks for any help you can provide!

2
  • Please show your effort so far. Commented Jan 22, 2021 at 4:10
  • @TMayer Your code trials please. Commented Jan 22, 2021 at 6:53

1 Answer 1

1

You should use requests (if the robots.txt allow that):

import requests count = 2 while True: try: headers = { 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Cookie': '_ga=GA1.2.439277064.1611329580; _gat=1; _gid=GA1.2.1557861689.1611329580', 'Referer': 'https://www.vexforum.com/u?period=all', 'Host': 'www.vexforum.com', 'Accept-Language': 'it-it', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15', 'Accept-Encoding': 'gzip, deflate, br', 'Connection': 'keep-alive', 'X-CSRF-Token': 'undefined', 'Discourse-Present': 'true', 'X-Requested-With': 'XMLHttpRequest', } params = { 'order': 'likes_received', 'page': str(count), 'period': 'all' } r = requests.get('https://www.vexforum.com/directory_items?order=likes_received&page=2&period=all', headers=headers, params=params) print(r.json()) print('\n\n\n') print('___________________________________________________') print('\n\n\n') count +=1 except: pass 

You now have only to parse the json response grab the information you want.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.