so i'm trying to scrape questions from Quora, from the link https://www.quora.com/search?q=microwave&type=question Since the questions are dynamically loaded at first I used selenium to simulate scroll down but it is really slow so I'm trying differently. When scrolling down Quora sends a POST request to another link with some payload, I went in Dev tools and network to see what payload they were using.
It looks like this :
{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}} I ran this :
import requests url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery' data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}} r = requests.post(url, data = data) print(r) And got <Response [400]> I plugged in my user agent and replaced the null for 'null', i also tried None or '' or even deleting these keys from the dict but nothing gets it to work. So maybe I got the wrong hash, I looked at the whole website HTML and other requests it sends and receives to find the hash but didn't succeed.
- Is the error 400 coming from 'null' items ?
- Is the hash a common thing used in POST requests and how to possibly get it ? Thanks