如何在内存较少的堆中用Python输入一个巨大的整数数组字符串



我有一个输入测试文件

10000000
1 23 53 64 599 -645 746 84 944 10 ..(10000000 integers)

我用来接受输入的python3代码如下所示

t = int(input())
a = [int(i) for i in input().split()]

由于第二行太大,我的程序占用了大量的RAM(短时间间隔<1GB(。在C/C++中使用cin/scanf:的相同实现

int a;
cin >> a;
bitset<10000000> visited;
while (a--)
{
int x;
scanf("%d",&x);
visited[x] = true;
}

仅使用7MB。有什么办法可以减少这种情况吗?在不将整个字符串一次加载到内存中的情况下获得整数输入(比如部分加载字符串(?

不,您只能编写自己的输入流处理程序。您将需要读取一个缓冲区(您选择的缓冲区大小(,尽可能读取split,并在行的末尾保存任何部分整数。

例如:

leftover = ''
while True:
buffer = leftover + sys.stdin.read(64)
str_num = buffer.split()
if buffer[-1] != ' ':
leftover = str_num.pop(-1)
new_values = [int(i) for i in str_num]
# process new values

当您达到EOF时,捕捉/检测条件。

扩展我的评论:

根据psutil的说法,这种实现总共占用了大约26兆字节的RSS内存,其中在read_file()期间分配了10兆字节。

(额外的好处是,由于我们可以为每个整数使用一个完整字节的内存,因此我们可以准确地计算出每个整数的内存数量(除非我们溢出8位…(。(

import random
import array
import psutil

def read_file(inf):
n = int(inf.readline())
# Preallocate an array.
# TODO: this uses one byte per `n`, not one bit.
# The allocation also takes some additional temporary memory
# due to the list initializer.
arr = array.array("b", [0] * n)
input_buffer = ""
nr = 0
while True:
# Read a chunk of data,
read_buffer = inf.read(131072)
# Add it to the input-accumulating buffer
input_buffer += read_buffer
# Partition the accumulating buffer from the right,
# and swap the "rest" (after the last space) to be
# the new accumulating buffer.
process_chunk, sep, input_buffer = input_buffer.rpartition(" ")
if not process_chunk:  # Nothing to process anymore
break
for value in process_chunk.split(" "):
arr[int(value)] += 1
nr += 1
assert n == nr
return arr

mi0 = psutil.Process().memory_info()
with open("output.txt", "r") as inf:
arr = read_file(inf)
mi1 = psutil.Process().memory_info()
print("initial memory usage", mi0.rss)
print("final memory usage..", mi1.rss)
print("delta from initial..", mi1.rss - mi0.rss)

相关内容

最新更新