以块为单位读取大文件，压缩和写入块

由于文件较大并处理它们，我遇到了一个问题，文件的大小正在逐渐增加，并将在未来继续增加。由于我上传压缩文件的第三方应用程序的限制，我只能使用 deflate 作为压缩选项。

运行脚本的服务器上的内存有限，因此会出现内存的常见问题，因此我尝试以块为单位读取并写入块，输出是所需的压缩文件。

到目前为止，我一直在使用此代码片段来压缩文件以减小大小，并且直到现在文件处理/压缩两个大时，它一直工作正常。

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
file_compressed.write(zlib.compress(file_upload.read()))

我试图绕过它的一些不同选项，到目前为止，所有这些选项都无法正常工作。

with open(file_path_partial, 'rb') as file_upload:
with open(file_path, 'wb') as file_compressed:
with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
shutil.copyfileobj(file_upload, file_compressed)

BLOCK_SIZE = 64
compressor = zlib.compressobj(1)
filename = file_path_partial
with open(filename, 'rb') as input:
with open(file_path, 'wb') as file_compressed:
while True:            
block = input.read(BLOCK_SIZE)
if not block:
break
file_compressed.write(compressor.compress(block))

下面的示例

以64k 块读取，修改每个块并将其写出到 gzip 文件中。

这是你想要的吗？

import gzip
with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
while True:
block = fin.read(65536) # read in 64k blocks
if not block:
break
# comment next line to just write through
block = block.replace(b"a", b"A")
fout.write(block)

相关内容

最新更新

热门标签：