以块为单位读取大文件,压缩和写入块



由于文件较大并处理它们,我遇到了一个问题,文件的大小正在逐渐增加,并将在未来继续增加。由于我上传压缩文件的第三方应用程序的限制,我只能使用 deflate 作为压缩选项。

运行脚本的服务器上的内存有限,因此会出现内存的常见问题,因此我尝试以块为单位读取并写入块,输出是所需的压缩文件。

到目前为止,我一直在使用此代码片段来压缩文件以减小大小,并且直到现在文件处理/压缩两个大时,它一直工作正常。

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
file_compressed.write(zlib.compress(file_upload.read()))

我试图绕过它的一些不同选项,到目前为止,所有这些选项都无法正常工作。

1(

with open(file_path_partial, 'rb') as file_upload:
with open(file_path, 'wb') as file_compressed:
with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
shutil.copyfileobj(file_upload, file_compressed)

2(

BLOCK_SIZE = 64
compressor = zlib.compressobj(1)
filename = file_path_partial
with open(filename, 'rb') as input:
with open(file_path, 'wb') as file_compressed:
while True:            
block = input.read(BLOCK_SIZE)
if not block:
break
file_compressed.write(compressor.compress(block))
下面的示例

以64k 块读取,修改每个块并将其写出到 gzip 文件中。

这是你想要的吗?

import gzip
with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
while True:
block = fin.read(65536) # read in 64k blocks
if not block:
break
# comment next line to just write through
block = block.replace(b"a", b"A")
fout.write(block)

最新更新