我通过类Process
使用多处理模块来执行一些不受cpu限制的任务,例如I/O或web请求。如果任务耗时过长,CPU将达到100%的使用率(所有线程都在等待数据返回)。我怀疑异步执行解决方案,但我从未做过这样的事情。我使用的代码如下所示,其中我有一个巨大的列表,每个进程都在一个块上工作。
你能向这个方向提个建议吗?
提前感谢!!
import multiprocessing
def getData(urlsChunk, myQueue):
for url in urlsChunk:
fp = urllib.urlopen(url)
try:
data = fp.read()
myQueue.put(data)
finally:
fp.close()
return myQueue
manager = multiprocessing.Manager()
HUGEQ = manager.Queue()
urls = ['a huge list of url items']
chunksize = int(math.ceil(len(urls) / float(nprocs)))
for i in range(nprocs):
p = Process(
target = getData, # This is my worker
args=(urls[chunksize * i:chunksize * (i + 1)],
MYQUEUE
)
)
processes.append(p)
p.start()
for p in processes:
p.join()
while True:
try:
MYQUEUEelem = MYQUEUE.get(block=False)
except Empty:
break
else:
'do something with the MYQUEUEelem'
使用multiprocessing.Pool
,您的代码可以简化:
import multiprocessing
def getData(url):
fp = urllib.urlopen(url)
try:
return fp.read()
finally:
fp.close()
if __name__ == '__main__': # should protect the "entry point" of the program
urls = ['a huge list of url items']
pool = multiprocessing.Pool()
for result in pool.imap(getData, urls, chunksize=10):
# do something with the result