使用启动方法'spawn'的 Python 多处理不起作用



我写了一个Python类来并行绘制pylot。它在 Linux 上运行良好,默认启动方法是 fork,但是当我在 Windows 上尝试时,我遇到了问题(可以使用 spawn start 方法在 Linux 上重现 - 请参阅下面的代码(。我总是最终收到此错误:

Traceback (most recent call last):
  File "test.py", line 50, in <module>
    test()
  File "test.py", line 7, in test
    asyncPlotter.saveLinePlotVec3("test")
  File "test.py", line 41, in saveLinePlotVec3
    args=(test, ))
  File "test.py", line 34, in process
    p.start()
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingprocess.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingcontext.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingcontext.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingpopen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingreduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle weakref objects
C:PythonMonteCarloTools>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingspawn.py", line 99, in spawn_main
    new_handle = reduction.steal_handle(parent_pid, pipe_handle)
  File "C:UsersadrianAppDataLocalProgramsPythonPython37libmultiprocessingreduction.py", line 82, in steal_handle
    _winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect

我希望有一种方法可以使此代码适用于Windows。以下是Linux和Windows上可用的不同启动方法的链接:https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

import multiprocessing as mp
def test():
    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))
    asyncPlotter.saveLinePlotVec3("test")
    asyncPlotter.saveLinePlotVec3("test")
    asyncPlotter.join()

class AsyncPlotter():
    def __init__(self, nc, processes=mp.cpu_count()):
        self.nc = nc
        self.pids = []
        self.processes = processes

    def linePlotVec3(self, nc, processes, test):
        self.waitOnPool(nc, processes)
        print(test)
        nc.value -= 1

    def waitOnPool(self, nc, processes):
        while nc.value >= processes:
            time.sleep(0.1)
        nc.value += 1

    def process(self, target, args):
        ctx = mp.get_context('spawn') 
        p = ctx.Process(target=target, args=args)
        p.start()
        self.pids.append(p)

    def saveLinePlotVec3(self, test):
        self.process(target=self.linePlotVec3,
                       args=(self.nc, self.processes, test))

    def join(self):
        for p in self.pids:
            p.join()

if __name__=='__main__':
    test()
使用

spawn start 方法时,Process对象本身将被挑选以在子进程中使用。 在代码中,target=target参数是 AsyncPlotter 的绑定方法。 看起来整个asyncPlotter实例也必须被腌制才能正常工作,这包括 self.manager ,它显然不想被腌制。

简而言之,将Manager保持在AsyncPlotter之外。 这适用于我的 macOS 系统:

def test():
    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))
    ...

此外,如您的评论中所述,asyncPlotter在重复使用时不起作用。 我不知道细节,但看起来它与Value对象如何在进程之间共享有关。 test函数需要如下所示:

def test():
    manager = mp.Manager()
    nc = manager.Value('i', 0)
    asyncPlotter1 = AsyncPlotter(nc)
    asyncPlotter1.saveLinePlotVec3("test 1")
    asyncPlotter2 = AsyncPlotter(nc)
    asyncPlotter2.saveLinePlotVec3("test 2")
    asyncPlotter1.join()
    asyncPlotter2.join()

总而言之,您可能希望重新构建代码并使用进程池。 它已经处理了AsyncPlottercpu_count和并行执行所做的事情:

from multiprocessing import Pool, set_start_method
from random import random
import time
def linePlotVec3(test):
    time.sleep(random())
    print("test", test)
if __name__ == "__main__":
    set_start_method("spawn")
    with Pool() as pool:
        pool.map(linePlotVec3, range(20))

或者你可以使用ProcessPoolExecutor来做几乎同样的事情。 此示例一次启动一个任务,而不是映射到列表:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp
import time
from random import random
def work(i):
    r = random()
    print("work", i, r)
    time.sleep(r)
def main():
    ctx = mp.get_context("spawn")
    with ProcessPoolExecutor(mp_context=ctx) as pool:
        for i in range(20):
            pool.submit(work, i)
if __name__ == "__main__":
    main()

为了便于移植,作为参数传递给将在进程中运行的函数的所有对象都必须是可选取的。

相关内容

  • 没有找到相关文章

最新更新