我正在尝试并行运行一个巨大的循环,但失败了。循环恰好是特定类的一个方法,在循环中我称之为另一个方法。它确实有效,但由于某种原因,列表中只有一个进程,输出(参见代码(始终为"Worker 0"。进程未创建或未并行运行。结构如下:
main.py
from my_class.py import MyClass
def main():
class_object = MyClass()
class_object.method()
if __name__ == '__main__':
main()
my_class.py
from multiprocessing import Process
MyClass(object):
def __init__(self):
# do something
def _method(self, worker_num, n_workers, amount, job, data):
for i, val in enumerate(job):
print('Worker %d' % worker_num)
self.another_method(val, data)
def another_method(self):
# do something to the data
def method(self):
# definitions of data and job_size go here
n_workers = 16
chunk = job_size // n_workers
resid = job_size - chunk * n_workers
workers = []
for worker_num in range(n_workers):
st = worker_num * chunk
amount = chunk if worker_num != n_workers - 1 else chunk + resid
worker = Process(target=self._method, args=[worker_num, n_workers, amount, job[st:st+amount], data])
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
return data
我已经阅读了一些关于需要主模块可导入的子进程的内容,但我不知道在我的情况下该怎么做。
问题:... 但仍然只有一个内核在使用中。所以问题是,我可以将多个内核与 Process 对象一起使用吗?
这并不取决于 Python 解释器哪个Process
正在使用哪个 CPU。
相关:on-what-cpu-cores-are-my-python-processes-running
使用以下方法扩展def _method(...
,看看实际发生的情况:
注意:
getpidcore(pid)
依赖于发行版,可能会失败!
def getpidcore(pid):
with open('/proc/{}/stat'.format(pid), 'rb') as fh:
core = int(fh.read().split()[-14])
return core
class MyClass(object):
...
def _method(self, worker_num, n_workers, amount, job, data):
for i, val in enumerate(job):
core = getpidcore(os.getpid())
print('core:{} pid:{} Worker({})'.format(core, os.getpid(), (worker_num, n_workers, amount, job)))
输出:
core:1 pid:7623 Worker((0, 16, 1, [1])) core:1 pid:7625 Worker((2, 16, 1, [3])) core:0 pid:7624 Worker((1, 16, 1, [2])) core:1 pid:7626 Worker((3, 16, 1, [4])) core:1 pid:7628 Worker((5, 16, 1, [6])) core:0 pid:7627 Worker((4, 16, 1, [5]))
使用 Python 测试:Linux 上的 3.4.2