在python中生成和写入大于系统RAM的文件

我知道这个不错的代码可以生成特定大小的文件并写下来。

def file_generator(location, size):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
target.write(os.urandom(size))
return filename

怎么会有一个小问题，它无法生成比系统 RAM 大的文件，它会因内存错误而失败，知道如何写出流中的文件或以某种方式解决此问题吗？

os.urandom返回指定大小的字符串。该字符串首先需要放入内存中。如果这是一个生成器，事情将以更节省内存的方式工作。

但是，它与系统内存无关。它不依赖于计算机上安装的物理RAM数量。它受到虚拟内存的限制，对于 64 位 Windows 上的 64 位程序，虚拟内存为 ~ 8TB。但是，这可能涉及交换到磁盘，这会变慢。

因此，潜在的解决方案是：

从 32 位 Python 切换到 64 位 Python，您根本不需要更改程序。当您到达物理 RAM 的末尾时，它会变得明显变慢。
将文件分成较小的部分编写，例如一次 10 MB。

与@quamrana的答案相反，我不会更改方法签名。调用方仍然可以选择 1 块 à 8 GB，效果与以前相同。

下面从调用方那里承担了这一负担：

def file_generator(location, size):
filename = str(uuid.uuid4())
chunksize = 10*1024*1024
with open('{0}{1}'.format(location, filename), 'wb') as target:
while size>chunksize:
target.write(os.urandom(chunksize))
size -= chunksize
target.write(os.urandom(size))
return filename

在处理此类问题时，解决方案是将数据分解为块，选择一个有利的块大小，以便：

小于某些无法控制的限制(在本例中为RAM大小(
太小，所以这个过程不会永远花很长时间

在下面的示例中，所需的文件大小被拆分为 (32 MiB( 块(导致完整块的数量 (>= 0(，最后可能是一个不完整的块(。

code.py：

import sys
import os
import uuid

DEFAULT_CHUNK_SIZE = 33554432  # 32 MiB

def file_generator(location, size):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
target.write(os.urandom(size))
return filename

def file_generator_chunked(location, size, chunk_size=DEFAULT_CHUNK_SIZE):
file_name = str(uuid.uuid4())
chunks = size // chunk_size
last_chunk_size = size % chunk_size
with open("{0}{1}".format(location, file_name), "wb") as target:
for _ in range(chunks):
target.write(os.urandom(chunk_size))
if last_chunk_size:
target.write(os.urandom(last_chunk_size))
return file_name

def main():
file_name = file_generator_chunked("/tmp", 100000000)

if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()

将文件写入块：

def large_file_generator(location, block_size, number_of_blocks):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
for _ in range(number_of_blocks):
target.write(os.urandom(block_size))
return filename

相关内容

最新更新

热门标签：