如何修复请求中字节的UnicodeDecodeError



我有以下完整的工作示例代码,使用selenium wire记录所有请求。

import os
import sys
import json
from seleniumwire import webdriver
driver = webdriver.Chrome()
driver.get("http://www.google.com")
list_requests = []
for request in driver.requests:
req = {
"method": request.method,
"url": request.url,
"body": request.body.decode(), # to avoid json error
"headers": {k:str(v) for k,v in request.headers.__dict__.items()} # to avoid json error
}

if request.response:
resp = {
"status_code": request.response.status_code,
"reason": request.response.reason,
"body": request.response.body.decode(), # ???
"headers": {k:str(v) for k,v in request.response.headers.__dict__.items()} # to avoid json error
}
req["response"] = resp
list_requests.append(req)
with open(f"test.json", "w") as outfile:
json.dump(list_requests, outfile)

然而,对响应体的解码会产生错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte

在没有试图解码响应体的情况下,我得到了一个错误

TypeError: Object of type bytes is not JSON serializable

我不在乎编码,我只想能够以某种方式将"body"写入json文件。如果需要,可以删除有问题的字节/字符,我不在乎。

有什么想法可以解决这个问题吗?

我使用了下一种方法来从json响应中提取一些字段(some_key(:

from gzip import decompress
import json
some_key = None
for request in driver.requests:
if request.response:
if request.method == 'POST':
print(request.method + ' ' + request.url)
try:
# try to parse the json response to extract the data
data = json.loads(request.response.body)
print('parsed as json')
if 'some_key' in data:
some_key = data['some_key']
except UnicodeDecodeError:
try:
# decompress on UnicodeDecodeError and parse the json response to extract the data
data = json.loads(decompress(request.response.body))
print('decompressed and parsed as json')
if 'some_key' in data:
some_key = data['some_key']
except json.decoder.JSONDecodeError:
data = request.response.body
print('decompressed and not parsed')
print(data)
print(some_key)

gzip.decompress帮助我学习UnicodeDecodeError

希望这会有所帮助。

Chrome可能会发送您意想不到的请求。如果你不处理它们,它可能会破坏响应分析。这些是GET请求,类似于

GET https://r3---sn-n4g-gon6.gvt1.com/edgedl/chrome/dict/en-us-10-1.bdic?cms_redirect=yes&mh=7g&mip=XXX.XXX.XXX.XXX&mm=28&mn=sn-n4g-gon6&ms=nvh&mt=1683454352068&mv=u&mvi=3&pl=24&rmhost=r4---sn-n4g-gon6.gvt1.com&shardbypass=sd
GET https://content-autofill.googleapis.com/v1/pages/ChVDaHJvbWUvMTExLjAuNTU2My4xMTASnwMJD36cJ79vbM...

因此,过滤POST方法可以做到这一点:

for request in driver.requests:
if request.response:
if request.method == 'POST' AND request.url.split('/')[3] == 'mydomain.com':
data = json.loads(request.response.body)

相关内容

  • 没有找到相关文章

最新更新