Python - 超出 cpu.count() 值的多进程改进



我实现了多进程脚本来改进图像分析。它运行良好,目前我尝试根据时间定义最佳进程数。正如我增加进程和时间时所解释的那样,但这种情况也超出了 cpu.count(( 的值。所以我不明白我以为除了后者之外,我的时间又增加了......有解释吗?

我的一段代码:

from multiprocessing import Process, current_process
import multiprocessing 
import glob as glob
import matplotlib.pyplot as plt
from skimage import io
import time
import sys
import numpy as np
import numpy.ma as ma
import gc
import os
from PIL import Image
from skimage import exposure
import cv2 

Path_input = "E:\test\raw\"
Path_output = "E:\test\"
Img_list = glob.glob((Path_input + 'Test_*.tif' ))[:]
size_y,size_x = io.imread(Img_list[0]).shape
#Function for the multi process
def Ajustement(x):
    #image reading
    img = plt.imread(Img_list[x])
    #create a CLAHE object
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    cl1 = clahe.apply(img_rescale.astype(np.uint16))    
    cv2.imwrite(Path_output+ '\Ajusted' + "%05d" % x + '.tif',cl1) 
    return 'Ajustement OK!'

#create strings list of process
cpu_max = 10
list = ['Process_']*cpu_max
list_process =[]
counter = 1
for u in list:
    list_process.append(list[counter-1]+np.str(counter))
    counter = counter+1
get_timer = time.clock if sys.platform == "win32" else time.time
time_store = []
time_process = []
if __name__ == '__main__':
    range_adjusted = np.arange(0,len(Img_list),cpu_max)
    m=0
    for m in range(0,100,1): #loop for obtain a mean time for the process
        gc.collect()
        print m
        timer = get_timer()  # time measuring starts now
        for i in range_adjusted:
            o = 0
            for item in list_process[:cpu_max]: #process creation
                globals()[item] = Process(name ='worker1', target=Normalization_and_ajustement, args=(i+o,))
                o=o+1               
            o = 0
            for item in list_process[:cpu_max]: #process start
                globals()[item].start() 
                o=o+1               
            o = 0
            for item in list_process[:cpu_max]: #process join
                globals()[item].join()  
                o=o+1   
            if i == range_adjusted.max():
                print("Normalization and Equalization finished")
                timer = get_timer() - timer  # get delta time as soon as it finishes
                time_store.append(timer)
                time_process.append(timer/cpu_max)
                np.savetxt(Path_output + 'time_tot_normalization.txt',time_store)
                np.savetxt(Path_output + 'time_process_normalization.txt',time_process)
                print("tTotal: {:.2f} seconds".format(timer))
                print("tAvg. per process: {:.2f} seconds".format(timer/cpu_max))
        m=m+1

它是英特尔至强 E5-2630v4,频率为 2.2GHz,带有 2 个 20 核的插槽。cpu.count(( 给出了 40 个,这里有一些测试:
40 个进程:322 秒\80 个进程:311 秒\200 进程:306 秒\500 进程:295 秒\750 个进程:311 秒

您发布的数字对于更多的进程并没有显示出巨大的改进,但是......

。我怀疑imread是一个因素。每个调用都是阻塞的,但在其某些运行时,它将等待来自磁盘的数据,因此不会占用太多 CPU 时间。在此期间,内核将花费更多时间在其他进程上。总体运行时间减少,因为由于等待来自磁盘的数据而未使用内核的时间量减少。

相关内容

  • 没有找到相关文章

最新更新