为什么python多处理按顺序运行



我试图加快两个大矩阵的点积,所以我测试了一个多处理的小例子。代码如下。但从结果来看,我发现我的代码是按顺序运行的。

代码

import multiprocessing as mp
import numpy as np
import time

def dot(i):
print(f"Process {i} enters")
np.random.seed(10)
a = np.random.normal(0, 1, (5000, 5000))
b = np.random.normal(0, 1, (5000, 5000))
print(f"Process {i} starts calculating")
res = np.dot(a, b)
print(f"Process {i} finishes")
return res

if __name__ == '__main__':

start = time.perf_counter()
dot(1)
print(time.perf_counter() - start)
print('=============================')

print(mp.cpu_count())
i = 8
start = time.perf_counter()
pool = mp.Pool(mp.cpu_count())
res = []
for j in range(i):
res.append(pool.apply_async(dot,  args=(j,)))
pool.close()
pool.join()
end = time.perf_counter()
# res = [r.get() for r in res]
# print(res)
print(end - start)

结果

Process 1 enters
Process 1 starts calculating
Process 1 finishes
2.582571708
=============================
8
Process 0 enters
Process 1 enters
Process 2 enters
Process 3 enters
Process 4 enters
Process 5 enters
Process 6 enters
Process 7 enters
Process 4 starts calculating
Process 7 starts calculating
Process 5 starts calculating
Process 3 starts calculating
Process 1 starts calculating
Process 6 starts calculating
Process 0 starts calculating
Process 2 starts calculating
Process 4 finishes
Process 7 finishes
Process 1 finishes
Process 0 finishes
Process 6 finishes
Process 2 finishes
Process 5 finishes
Process 3 finishes
27.05124225

结果表明,代码似乎确实是并行运行的(从文本来看(,但最终运行时间似乎是按顺序运行的。我不知道为什么,所以希望有人能给我一些建议。提前谢谢。

当然,在创建进程以及在地址空间之间传递参数和结果时总是会涉及额外的开销(在这种情况下,结果非常大(。

我的最佳猜测是,性能问题的出现是因为并行运行8个进程(我假设您至少有8个逻辑处理器,最好是8个物理处理器(的存储需求(由于正在计算的大型阵列(可能会导致极端的分页(我得到的结果与您相同(。因此,我修改了demo,使其内存占用较少,但通过在循环中多次执行dot函数,保持了较高的CPU要求。我还将进程数量减少到4个,这是我桌面上物理处理器的数量,这使每个进程都有更好的机会并行运行:

from multiprocessing.pool import Pool
import numpy as np
import time

def dot(i):
print(f"Process {i} enters")
np.random.seed(10)
a = np.random.normal(0, 1, (50, 50))
b = np.random.normal(0, 1, (50, 50))
print(f"Process {i} starts calculating")
for _ in range(500_000):
res = np.dot(a, b)
print(f"Process {i} finishes")
return res

if __name__ == '__main__':
start = time.perf_counter()
dot(1)
print(time.perf_counter() - start)
print('=============================')
i = 4
start = time.perf_counter()
pool = Pool(i)
res = []
for j in range(i):
res.append(pool.apply_async(dot,  args=(j,)))
pool.close()
pool.join()
end = time.perf_counter()
# res = [r.get() for r in res]
# print(res)
print(end - start)

结果:

Process 1 enters
Process 1 starts calculating
Process 1 finishes
6.0469717
=============================
Process 0 enters
Process 0 starts calculating
Process 1 enters
Process 1 starts calculating
Process 2 enters
Process 3 enters
Process 2 starts calculating
Process 3 starts calculating
Process 0 finishes
Process 1 finishes
Process 3 finishes
Process 2 finishes
8.8419177

这更接近于你的预期。当我将i更改为8时,逻辑处理器的数量和运行时间分别为6.1023760000000005和12.749368100000002。

相关内容

  • 没有找到相关文章

最新更新