从http解压(Gzip)响应块.客户端调用

我有以下代码，我正在使用它来尝试从http.client. httpconnection获取请求到API的块响应(请注意响应是gzip编码的):

connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
connection.request('GET', api_url, headers = auth)
response = connection.getresponse()
while chunk := response.read(20):
data = gzip.decompress(chunk)
data = json.loads(chunk)
print(data)

这总是给出一个错误，它是not a gzipped file (b'xe5x9d')。不确定我如何能够块数据，仍然实现我在这里要做的事情。基本上，我正在分块，这样我就不必将整个响应加载到内存中。请注意，我不能使用任何其他库，如请求，urllib等。

最可能的原因是，您收到的响应确实不是gzip文件。

我注意到在你的代码中，你传递了一个名为auth的变量。通常情况下，如果您没有在请求头中指定您可以接受，服务器将不会向您发送压缩响应。如果像变量名所暗示的那样，头文件中只有auth相关的键，则不会收到压缩后的响应。首先，确保你的标题中有'Accept-Encoding': 'gzip'。

展望未来，你将面临另一个问题:

基本上，我正在分块，这样我就不必在内存中加载整个响应。

gzip.decompress将期望得到一个完整的文件，因此您需要重新构建它并在执行此操作之前将其完全加载到内存中，这将破坏对响应进行分块处理的全部意义。试图用gzip.decompress解压缩gzip的部分，很可能会得到一个EOFError，表示类似Compressed file ended before the end-of-stream marker was reached。

我不知道你是否可以直接用gzip库管理，但我知道如何用zlib来做。您还需要将chunk转换为类似文件的对象，您可以使用io.BytesIO完成此操作。我看到你对库有很强的约束，但是zlib和io是python默认的一部分，所以希望你有它们可用。这里是你的代码的重写，应该有助于你继续:

import http
import ssl
import gzip
import zlib
from io import BytesIO
# your variables here
api = 'your_api_host'
api_url = 'your_api_endpoint'
auth = {'AuhtKeys': 'auth_values'}
# add the gzip header
auth['Accept-Encoding'] = 'gzip'
# prepare decompressing object
decompressor = zlib.decompressobj(16 + zlib.MAX_WBITS)
connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
connection.request('GET', api_url, headers = auth)
response = connection.getresponse()
while chunk := response.read(20):
data = decompressor.decompress(BytesIO(chunk).read())
print(data)

问题是gzip.decompress需要一个完整的文件，您不能只提供一个块给它，因为deflate算法在解压缩期间依赖于以前的数据。该算法的关键在于它能够重复它之前已经见过的东西，因此，所有的数据都是必需的。

然而，deflate只关心最后32 KiB左右。因此，在不需要太多内存的情况下流式解压这样的文件是可能的。这不是你需要自己实现的东西，Python提供了gzip.GzipFile类，它可以用来包装文件句柄，并像普通文件一样运行:

import io
import gzip
# Create a file for testing.
# In your case you can just use the response object you get.
file_uncompressed = ""
for line_index in range(10000):
file_uncompressed += f"This is line {line_index}.n"
file_compressed = gzip.compress(file_uncompressed.encode())
file_handle = io.BytesIO(file_compressed)
# This library does all the heavy lifting 
gzip_file = gzip.GzipFile(fileobj=file_handle)
while chunk := gzip_file.read(1024):
print(chunk)

相关内容

最新更新

热门标签：