为什么要进行多处理.池和多处理.进程在 Linux 中执行的方式非常不同



我运行了一些测试代码,如下所示来检查在Linux中使用Pool和Process的性能。我正在使用Python 2.7。多处理的源代码。池似乎显示它正在使用多处理。过程。但是,多处理。池花费很多时间和内存,而不是等同的#多处理。过程,我不明白这个。

这是我所做的:

  1. 创建一个大型字典,然后创建子流程。

  2. 将字典传递给每个子进程以实现只读。

  3. 每个子进程都进行一些计算并返回一个小结果。

以下是我的测试代码:

from multiprocessing import Pool, Process, Queue
import time, psutil, os, gc
gct = time.time
costTime = lambda ET: time.strftime('%H:%M:%S', time.gmtime(int(ET)))
def getMemConsumption():
procId = os.getpid()
proc = psutil.Process(procId)
mem = proc.memory_info().rss
return "process ID %d.nMemory usage: %.6f GB" % (procId, mem*1.0/1024**3)
def f_pool(l, n, jobID):
try:
result = {}
# example of subprocess work
for i in xrange(n):
result[i] = l[i]
# work done
# gc.collect()
print getMemConsumption()
return 1, result, jobID
except:
return 0, {}, jobID
def f_proc(q, l, n, jobID):
try:
result = {}
# example of subprocess work
for i in xrange(n):
result[i] = l[i]
# work done
print getMemConsumption()
q.put([1, result, jobID])
except:
q.put([0, {}, jobID])
def initialSubProc(targetFunc, procArgs, jobID):
outQueue = Queue()
args = [outQueue]
args.extend(procArgs)
args.append(jobID)
p = Process(target = targetFunc, args = tuple(args))
p.start()
return p, outQueue

def track_add_Proc(procList, outQueueList, maxProcN, jobCount, 
maxJobs, targetFunc, procArgs, joinFlag, all_result):
if len(procList) < maxProcN:
p, q = initialSubProc(targetFunc, procArgs, jobCount)
outQueueList.append(q)
procList.append(p)
jobCount += 1
joinFlag.append(0)
else:
for i in xrange(len(procList)):
if not procList[i].is_alive() and joinFlag[i] == 0:
procList[i].join()
all_results.append(outQueueList[i].get())
joinFlag[i] = 1 # in case of duplicating result of joined subprocess
if jobCount < maxJobs:
p, q = initialSubProc(targetFunc, procArgs, jobCount)
procList[i] = p
outQueueList[i] = q
jobCount += 1
joinFlag[i] = 0
return jobCount
if __name__ == '__main__':
st = gct()
d = {i:i**2 for i in xrange(10000000)}
print "MainProcess create data dictn%s" % getMemConsumption()
print 'Time to create dict: %snn' % costTime(gct()-st)
nproc = 2
jobs = 8
subProcReturnDictLen = 1000
procArgs = [d, subProcReturnDictLen]
print "Use multiprocessing.Pool, max subprocess = %d, jobs = %dn" % (nproc, jobs)
st = gct()
pool = Pool(processes = nproc)
for i in xrange(jobs):
procArgs.append(i)
sp = pool.apply_async(f_pool, tuple(procArgs))
procArgs.pop(2)
res = sp.get()
if res[0] == 1:
# do something with the result
pass
else:
# do something with subprocess exception handle
pass
pool.close()
pool.join()
print "Total time used to finish all jobs: %s" % costTime(gct()-st)
print "Main Processn", getMemConsumption(), 'n'
print "Use multiprocessing.Process, max subprocess = %d, jobs = %dn" % (nproc, jobs)
st = gct()
procList = []
outQueueList = []
all_results = []
jobCount = 0
joinFlag = []
while (jobCount < jobs):
jobCount = track_add_Proc(procList, outQueueList, nproc, jobCount, 
jobs, f_proc, procArgs, joinFlag, all_results)
for i in xrange(nproc):
if joinFlag[i] == 0:
procList[i].join()
all_results.append(outQueueList[i].get())
joinFlag[i] = 1
for i in xrange(jobs):
res = all_results[i]
if res[0] == 1:
# do something with the result
pass
else:
# do something with subprocess exception handle
pass
print "Total time used to finish all jobs: %s" % costTime(gct()-st)
print "Main Processn", getMemConsumption()

结果如下:

MainProcess create data dict
process ID 21256.
Memory usage: 0.841743 GB
Time to create dict: 00:00:02

Use multiprocessing.Pool, max subprocess = 2, jobs = 8
process ID 21266.
Memory usage: 1.673084 GB
process ID 21267.
Memory usage: 1.673088 GB
process ID 21266.
Memory usage: 2.131172 GB
process ID 21267.
Memory usage: 2.131172 GB
process ID 21266.
Memory usage: 2.176079 GB
process ID 21267.
Memory usage: 2.176083 GB
process ID 21266.
Memory usage: 2.176079 GB
process ID 21267.
Memory usage: 2.176083 GB
Total time used to finish all jobs: 00:00:49
Main Process
process ID 21256.
Memory usage: 0.843079 GB 

Use multiprocessing.Process, max subprocess = 2, jobs = 8
process ID 23405.
Memory usage: 0.840614 GB
process ID 23408.
Memory usage: 0.840618 GB
process ID 23410.
Memory usage: 0.840706 GB
process ID 23412.
Memory usage: 0.840805 GB
process ID 23415.
Memory usage: 0.840900 GB
process ID 23417.
Memory usage: 0.840973 GB
process ID 23419.
Memory usage: 0.841061 GB
process ID 23421.
Memory usage: 0.841152 GB
Total time used to finish all jobs: 00:00:00
Main Process
process ID 21256.
Memory usage: 0.843781 GB

我不知道为什么多处理的子进程。池开始时需要大约 1.6GB,但从多处理中获取子进程。进程只需要 0.84 GB,等于主进程的内存成本。在我看来,只有多处理。Process 享有 Linux 的"写入时复制"优势,因为所有所需作业的时间都不到 1 秒。我不知道为什么要多处理。游泳池不喜欢这个。从源代码,多处理。池似乎是多处理的包装器。过程。

问题:我不知道为什么多处理的子进程。池开始时需要大约 1.6GB,
...池似乎是多处理的包装器。过程

这是Pool为所有作业的结果保留内存。
其次,Pool使用两个SimpleQueue()三个Threads
第三,Pool复制所有传递的argv数据,然后再传递到process

您的process示例仅对所有Queue()使用一个,按原样传递argv

Pool远在只是包装纸。

相关内容

  • 没有找到相关文章

最新更新