1

so i'm trying to scrape questions from Quora, from the link https://www.quora.com/search?q=microwave&type=question Since the questions are dynamically loaded at first I used selenium to simulate scroll down but it is really slow so I'm trying differently. When scrolling down Quora sends a POST request to another link with some payload, I went in Dev tools and network to see what payload they were using.

It looks like this :

{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}} 

I ran this :

import requests url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery' data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}} r = requests.post(url, data = data) print(r) 

And got <Response [400]> I plugged in my user agent and replaced the null for 'null', i also tried None or '' or even deleting these keys from the dict but nothing gets it to work. So maybe I got the wrong hash, I looked at the whole website HTML and other requests it sends and receives to find the hash but didn't succeed.

  1. Is the error 400 coming from 'null' items ?
  2. Is the hash a common thing used in POST requests and how to possibly get it ? Thanks

1 Answer 1

2

First of all, ensure that your payload is properly formatted as JSON, like this:

data = json.dumps({ "queryName": "SearchResultsListQuery", "variables": { "query": "microwave", "disableSpellCheck": None, "resultType": "question", "author": None, "time": "all_times", "first": 10, "after": "19", "tribeId": None }, "extensions": { "hash": "f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b" } }) 

Also, to get a successful response from the quora graph API, you must include a cookie in your request headers:

headers = { 'cookie': '...', ... } r = requests.post(url, headers=headers, data=data) 

You can find the cookie in your browsers dev tools.

Sign up to request clarification or add additional context in comments.

2 Comments

Hello, thanks for your answer I ran question_page = requests.get("quora.com/search?q=microwave&type=question") data = ... r = requests.post(url, data=data, cookies=question_page.cookies) print(r) I got the 400 error still, i will try to figure out what is wrong
I copied the request from chrome dev tools and used the website curlconverter.com to convert it into python code, status code is 200, thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.