我有一个输入测试文件
10000000
1 23 53 64 599 -645 746 84 944 10 ..(10000000 integers)
我用来接受输入的python3代码如下所示
t = int(input())
a = [int(i) for i in input().split()]
由于第二行太大,我的程序占用了大量的RAM(短时间间隔<1GB(。在C/C++中使用cin/scanf:的相同实现
int a;
cin >> a;
bitset<10000000> visited;
while (a--)
{
int x;
scanf("%d",&x);
visited[x] = true;
}
仅使用7MB。有什么办法可以减少这种情况吗?在不将整个字符串一次加载到内存中的情况下获得整数输入(比如部分加载字符串(?
不,您只能编写自己的输入流处理程序。您将需要读取一个缓冲区(您选择的缓冲区大小(,尽可能读取split
,并在行的末尾保存任何部分整数。
例如:
leftover = ''
while True:
buffer = leftover + sys.stdin.read(64)
str_num = buffer.split()
if buffer[-1] != ' ':
leftover = str_num.pop(-1)
new_values = [int(i) for i in str_num]
# process new values
当您达到EOF时,捕捉/检测条件。
扩展我的评论:
根据psutil的说法,这种实现总共占用了大约26兆字节的RSS内存,其中在read_file()
期间分配了10兆字节。
(额外的好处是,由于我们可以为每个整数使用一个完整字节的内存,因此我们可以准确地计算出每个整数的内存数量(除非我们溢出8位…(。(
import random
import array
import psutil
def read_file(inf):
n = int(inf.readline())
# Preallocate an array.
# TODO: this uses one byte per `n`, not one bit.
# The allocation also takes some additional temporary memory
# due to the list initializer.
arr = array.array("b", [0] * n)
input_buffer = ""
nr = 0
while True:
# Read a chunk of data,
read_buffer = inf.read(131072)
# Add it to the input-accumulating buffer
input_buffer += read_buffer
# Partition the accumulating buffer from the right,
# and swap the "rest" (after the last space) to be
# the new accumulating buffer.
process_chunk, sep, input_buffer = input_buffer.rpartition(" ")
if not process_chunk: # Nothing to process anymore
break
for value in process_chunk.split(" "):
arr[int(value)] += 1
nr += 1
assert n == nr
return arr
mi0 = psutil.Process().memory_info()
with open("output.txt", "r") as inf:
arr = read_file(inf)
mi1 = psutil.Process().memory_info()
print("initial memory usage", mi0.rss)
print("final memory usage..", mi1.rss)
print("delta from initial..", mi1.rss - mi0.rss)