我有一个简单的代码:
path = [filepath1, filepath2, filepath3]
def umap_embedding(filepath):
file = np.genfromtxt(filepath,delimiter=' ')
if len(file) > 20000:
file = file[np.random.choice(file.shape[0], 20000, replace=False), :]
neighbors = len(file)//200
if neighbors >= 2:
neighbors = neighbors
else:
neighbors = 2
embedder = umap.UMAP(n_neighbors=neighbors,
min_dist=0.1,
metric='correlation', n_components=2)
embedder.fit(file)
embedded = embedder.transform(file)
name = 'file'
np.savetxt(name,embedded,delimiter=",")
if __name__ == '__main__':
p = Pool(processes = 20)
start = time.time()
for filepath in path:
p.apply_async(umap_embedding, [filepath])
p.close()
p.join()
print("Complete")
end = time.time()
print('total time (s)= ' + str(end-start))
当我执行时,控制台返回错误:
Traceback (most recent call last):
File "/home/cngc3/CBC/parallel.py", line 77, in <module>
p.apply_async(umap_embedding, [filepath])
File "/home/cngc3/anaconda3/envs/CBC/lib/python3.6/multiprocessing/pool.py", line 355, in apply_async
raise ValueError("Pool not running")
ValueError: Pool not running
我试图在Stackoverflow和Google上找到这个问题的解决方案,但没有相关的问题。 谢谢你的帮助。
p.close()
和p.join()
必须放在for
循环之后。否则,池将在循环的第一次迭代中关闭,并且在第二次迭代中不接受新作业。