Python多处理可迭代导致range()中的索引错误



我正试图编写一个python代码,绘制数字列表的数字密度,显示输入中有多少数字位于数字行的哪个间隔内。在输入"0"中;states.dat";文件,我有

3.5
3.6

和我写的脚本

import numpy as np
import re
import multiprocessing as mp

cpu_read=open('density.pbs', "r")
for line in cpu_read:
if re.search("select", line):
cpu = int(re.split('=|:',line)[5])
print("total number of cpu used for parallelization:", cpu)

step = 0.1
start = 3.5
end = 4.50
states = [line.rstrip('n') for line in open('states.dat')]
states_array=np.array(states).astype(np.float)
nsteps = int((end - start) / step)
print("nstepsn", nsteps)
final = np.zeros((nsteps+1 ,2), dtype=float)
for i in range (nsteps+1):
final[(i,0)] = start + i*step

def final_DOS(i):
print("iteration",i)
final[(i,1)] = np.count_nonzero((states_array >= final[(i,0)]) & (states_array < final[(i+1,0)]))
return final[(i,1)]
#for i in range(nsteps):
#   final_DOS(i)
pool = mp.Pool(cpu)
pool.map(final_DOS(i),[i for i in range (nsteps)])
pool.close()
pool.join()

print("completed final arrayn", final)
np.savetxt('DOS.txt',final,fmt='%5.5f', delimiter='    ')
print("done!")

当我不使用多处理时,即使用时,能够给我正确的输出

pool = mp.Pool(cpu)
pool.map(final_DOS(i),[i for i in range (nsteps)])
pool.close()
pool.join()

评论和

#for i in range(nsteps):
#   final_DOS(i)

取消注释。这将给出输出

total number of cpu used for parallelization: 2
states_array
[3.5 3.6]
nsteps
10
iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
iteration 7
iteration 8
iteration 9
completed final array
[[3.5 1. ]
[3.6 1. ]
[3.7 0. ]
[3.8 0. ]
[3.9 0. ]
[4.  0. ]
[4.1 0. ]
[4.2 0. ]
[4.3 0. ]
[4.4 0. ]
[4.5 0. ]]
done!

然而,当我在打开多处理的情况下运行脚本时,正如这里的完整代码中所示,我会得到以下输出和错误消息:

total number of cpu used for parallelization: 24
nsteps
10000
tail: density.out: file truncated
total number of cpu used for parallelization: 2
states_array
[3.5 3.6]
nsteps
10
final array
[[3.5 0. ]
[3.6 0. ]
[3.7 0. ]
[3.8 0. ]
[3.9 0. ]
[4.  0. ]
[4.1 0. ]
[4.2 0. ]
[4.3 0. ]
[4.4 0. ]
[4.5 0. ]]
iteration 10
Traceback (most recent call last):
File "density.py", line 41, in <module>
pool.map(final_DOS(i),[i for i in range (nsteps)])
File "density.py", line 34, in final_DOS
final[(i,1)] = np.count_nonzero((states_array >= final[(i,0)]) & (states_array < final[(i+1,0)]))
IndexError: index 11 is out of bounds for axis 0 with size 11

我不明白为什么当启用多处理时,多处理最终会在最大值为9的范围(nsteps(内得到I=10。知道为什么吗?

在不关闭文件的小点旁边(使用context manager(。

with open('density.pbs', "r") as cpu_read:
# your parsing
# Now the file is closed here.

您没有将函数传递给map参数,而是告诉函数final_DOS使用值i运行。这里使用的是:

for i in range(nsteps + 1):
final[(i, 0)] = start + i * step

最后的答案是10,因此当你点击以下行时,i被定义为10:

pool.map(final_DOS(i), [i for i in range(nsteps)])

在导致11的final_DOS函数中,您将再次增加i

要使用map方法,请注意,必须传入未初始化的函数。在任何情况下,final_DOS(注意没有括号。(

最新更新