所以我试着从Quora上抓取问题,从链接https://www.quora.com/search?q=microwave&type=question由于问题是动态加载的,我首先使用硒来模拟向下滚动,但它真的很慢,所以我尝试不同。当Quora发送一个POST请求到另一个带有一些有效负载的链接时,我进入开发工具和网络,看看他们使用的是什么有效负载。
它看起来像这样:
{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
我运行了这个:
import requests
url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery'
data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
r = requests.post(url, data = data)
print(r)
得到<Response [400]>
我插入了我的用户代理并将null替换为'null',我还尝试了None或'甚至从字典中删除这些键,但没有任何东西可以让它工作。所以也许我得到了错误的哈希值,我查看了整个网站的HTML和它发送和接收的其他请求来找到哈希值,但没有成功。
- 错误400来自'null'项目吗?
- 哈希是POST请求中使用的常见事物,如何可能得到它?由于
首先,确保您的有效负载正确地格式化为JSON,如下所示:
data = json.dumps({
"queryName": "SearchResultsListQuery",
"variables": {
"query": "microwave",
"disableSpellCheck": None,
"resultType": "question",
"author": None,
"time": "all_times",
"first": 10,
"after": "19",
"tribeId": None
},
"extensions": {
"hash": "f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"
}
})
另外,要从quora图形API获得成功的响应,您必须在请求头中包含一个cookie:
headers = {
'cookie': '...',
...
}
r = requests.post(url, headers=headers, data=data)
你可以在浏览器的开发工具中找到cookie。