我开始在python
学习multiprocessing
,我注意到相同的代码在主进程上的执行速度比在用multiprocessing
模块创建的进程中执行得更快。
这是我的代码的简化示例,我首先在main process
执行代码,并为前 10 次计算和总计算的打印时间执行代码。并且比在new process
上执行相同的代码(这是一个长时间运行的过程,我可以随时发送new_pattern
(。
import multiprocessing
import random
import time
old_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 2000)]
new_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 100)]
new_pattern_for_processing = multiprocessing.Array('d', 10)
there_is_new_pattern = multiprocessing.Value('i', 0)
queue = multiprocessing.Queue()
def iterate_and_add(old_patterns, new_pattern):
for each_pattern in old_patterns:
sum = 0
for count in range(0, 10):
sum += each_pattern[count] + new_pattern[count]
print_count_main_process = 0
def patt_recognition_main_process(new_pattern):
global print_count_main_process
# START of same code on main process
start_main_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern)
if print_count_main_process < 10:
print_count_main_process += 1
print("Time on main process one pattern:", time.time() - start_main_process_one_patt)
# END of same code on main process
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
#START of same code on new process
start_new_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern_on_new_proc)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
queue.put("DONE")
there_is_new_pattern.value = 0
if __name__ == "__main__":
start_main_process = time.time()
for new_pattern in new_patterns:
patt_recognition_main_process(new_pattern)
print(".n.n.")
print("Total Time on main process:", time.time() - start_main_process)
print("n###########################################################################n")
start_new_process = time.time()
p1 = multiprocessing.Process(target=patt_recognition_new_process, args=(old_patterns, new_pattern_for_processing, there_is_new_pattern, queue))
p1.start()
for new_pattern in new_patterns:
for idx, n in enumerate(new_pattern):
new_pattern_for_processing[idx] = n
there_is_new_pattern.value = 1
while True:
msg = queue.get()
if msg == "DONE":
break
print(".n.n.")
print("Total Time on new process:", time.time()-start_new_process)
这是我的结果:
Time on main process one pattern: 0.0025289058685302734
Time on main process one pattern: 0.0020127296447753906
Time on main process one pattern: 0.002008199691772461
Time on main process one pattern: 0.002511262893676758
Time on main process one pattern: 0.0020067691802978516
Time on main process one pattern: 0.0020036697387695312
Time on main process one pattern: 0.0020072460174560547
Time on main process one pattern: 0.0019974708557128906
Time on main process one pattern: 0.001997232437133789
Time on main process one pattern: 0.0030074119567871094
.
.
.
Total Time on main process: 0.22810864448547363
###########################################################################
Time on new process one pattern: 0.03462791442871094
Time on new process one pattern: 0.03308463096618652
Time on new process one pattern: 0.034590721130371094
Time on new process one pattern: 0.033623456954956055
Time on new process one pattern: 0.03407788276672363
Time on new process one pattern: 0.03308820724487305
Time on new process one pattern: 0.03408670425415039
Time on new process one pattern: 0.0345921516418457
Time on new process one pattern: 0.03710794448852539
Time on new process one pattern: 0.03358912467956543
.
.
.
Total Time on new process: 4.0528037548065186
为什么执行时间相差这么大?
它有点微妙,但问题在于
new_pattern_for_processing = multiprocessing.Array('d', 10)
它不容纳 pythonfloat
对象,它容纳原始字节,在这种情况下足以容纳 10 个 8 字节的机器级double
。当你读取或写入这个数组时,python必须float
转换为double
或相反。如果您正在读取或写入一次,这没什么大不了的,但是您的代码在循环中多次执行此操作,并且这些转换占主导地位。
为了确认,我曾经将机器级数组复制到 python 浮点数列表中,并让该过程对此进行处理。现在它的速度与父级相同。我的更改仅在一个函数中
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
local_pattern = new_pattern_on_new_proc[:]
#START of same code on new process
start_new_process_one_patt = time.time()
#iterate_and_add(old_patterns, new_pattern_on_new_proc)
iterate_and_add(old_patterns, local_pattern)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
there_is_new_pattern.value = 0
queue.put("DONE")
在这种特殊情况下,您似乎在另一个进程中执行顺序执行,而不是并行化您的算法。这会产生一些开销。
流程创建本身需要时间。但这还不是全部。您还将在队列中传输数据并使用管理器代理。这些在实践中都是队列,或者实际上是两个队列和另一个进程。 与使用内存中的数据副本相比,队列非常非常慢。
如果你拿你的代码,在另一个进程中执行它,并使用队列来传入和传出数据,它总是很慢。从性能的角度来看,这使得它毫无意义。但是,可能还有其他原因需要这样做,例如,如果您的主程序需要执行其他操作,例如等待IO。
如果你想要提高性能,你应该创建几个进程并拆分你的算法,以便在不同的进程中处理你的部分范围,从而并行工作。 如果您希望让一组工作进程准备好等待更多工作,也可以考虑Multiprocessing.Pool
。这将减少进程创建开销,因为您只需执行一次。在 Python 3 中,你也可以使用ProcessPoolExecutor
。
并行处理很有用,但很少有蛇油可以毫不费力地解决您的所有问题。为了充分利用它,您需要重新设计程序以最大化并行处理并最大限度地减少队列中的数据传输。