Selenium Python-获取网络响应体



我使用Selenium来响应来自网站的GET请求后的数据接收。网站调用的API不是公共的,所以如果我使用请求的URL来检索数据,我会得到{"message":"Unauthenticated."}

到目前为止,我所能做的就是检索响应的头部。

我在这里发现使用driver.execute_cdp_cmd('Network.getResponseBody', {...})可能是解决我的问题的方法。

这是我的代码示例:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
driver = webdriver.Chrome(
r"./chromedriver",
desired_capabilities=capabilities,
)
def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.response" in log["method"] and "params" in log.keys()):
headers = log["params"]["response"]
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
print(json.dumps(body, indent=4, sort_keys=True))
return log["params"]

logs = driver.get_log('performance')
responses = [processLog(log) for log in logs]

不幸的是,driver.execute_cdp_cmd('Network.getResponseBody', {...})返回:

unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}

你知道我缺少什么吗

您知道如何检索响应正文吗

谢谢你的帮助!

为了检索响应主体,您必须专门监听Network.responseReceived:

def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.responseReceived" in log["method"] and "params" in log.keys()):
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})

然而,我最终使用了一种依赖于requests的不同方法。我刚刚从浏览器控制台检索到授权令牌(网络>标头>请求标头>授权(,并用它来获取我想要的数据:

import requests
def get_data():
url = "<your_url>"
headers = {
"Authorization": "Bearer <your_access_token>",
"Content-type": "application/json"
}
params = {
key: value,
...
}
r = requests.get(url, headers = headers, params = params)
if r.status_code == 200:
return r.json()

可能有些响应没有正文,因此selenium抛出了一个错误"没有资源";找到了给定标识符的。这里的错误信息有点模棱两可。

试着这样做:

from selenium.common import exceptions
try:
body = chromedriver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
log['body'] = body
except exceptions.WebDriverException:
print('response.body is null')

这样,没有正文的响应不会使脚本崩溃。

最新更新