多进程中的管理器字典



下面是一个简单的多处理代码:

from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
    d[1].append(4)
    print d
if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
我得到的输出是:
{1: []}

为什么我不能得到{1: [4]}作为输出?

你写的是:

# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager
# every process creates own 'manager' and 'd'
manager = Manager() 
# BTW, Manager is also child process, and 
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()
def f():
    # 'd' - is that 'd', that is defined in globals in this, current process 
    d[1].append(4)
    print d
if __name__ == '__main__':
# from here code executes ONLY in main process 
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

你应该这样写:

from multiprocessing import Process, Manager
def f(d):
    d[1] = d[1] + [4]
    print d
if __name__ == '__main__':
    manager = Manager() # create only 1 mgr
    d = manager.dict() # create only 1 dict
    d[1] = []
    p = Process(target=f,args=(d,)) # say to 'f', in which 'd' it should append
    p.start()
    p.join()

不打印附加到d[1]的新项的原因在Python的官方文档中说明:

对dict和list代理中的可变值或项的修改将会不能通过管理器传播,因为代理没有办法知道它的值或项何时被修改。修改这样一个项目,您可以将修改后的对象重新分配给容器代理。

因此,实际情况是这样的:

from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
    # invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
    # (consider that a KeyError exception wasn't raised, so a list was definitely returned),
    # and append 4 to it, however this change is not propagated through the manager,
    # as it's performed on an ordinary list with which the manager has no interaction
    d[1].append(4)
    # convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
    # returning the "remote" string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
    # to which the change above was not propagated
    print d
if __name__ == '__main__':
    # invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

用新列表重新分配d[1],甚至在更新后再次使用相同的列表,触发管理器传播更改:

from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
    # perform the exact same steps, as explained in the comments to the previous code snippet above,
    # but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
    l = d[1]
    l.append(4)
    d[1] = l
    print d
if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

d[1] += [4]行也可以。


EDIT for Python 3.6或更高版本:

Python 3.6开始,根据这个问题之后的更改集,也可以使用嵌套代理对象,它会自动将对它们执行的任何更改传播到包含代理对象。因此,将d[1] = []行替换为d[1] = manager.list()也可以纠正这个问题:

from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
    d[1].append(4)
    # the __str__() method of a dict object invokes __repr__() on each of its items,
    # so explicitly invoking __str__() is required in order to print the actual list items
    print({k: str(v) for k, v in d.items()})
if __name__ == '__main__':
    d[1] = manager.list()
    p = Process(target=f)
    p.start()
    p.join()

不幸的是,此错误修复未移植到Python 2.7(从Python 2.7.13开始)。


说明(在Windows操作系统下):

虽然所描述的行为也适用于Windows操作系统,但由于不同的进程创建机制,附加的代码片段在Windows下执行时将失败,这依赖于CreateProcess() API而不是fork()系统调用,这是不支持的。

每当通过multiprocessing模块创建一个新进程时,Windows创建一个新的Python解释器进程,该进程导入主模块,具有潜在的危险副作用。为了避免这个问题,建议使用以下编程指南:

确保主模块可以被新的Python解释器安全地导入,而不会产生意想不到的副作用(例如启动一个新进程)。

因此,按照Windows下的方式执行附加的代码片段将尝试创建无限数量的进程,因为manager = Manager()行。通过在if __name__ == '__main__'子句中创建ManagerManager.dict对象,并将Manager.dict对象作为参数传递给f(),可以很容易地解决这个问题,就像这个答案中所做的那样。

关于这个问题的更多细节可以在这个回答中找到。

我认为这是管理器代理调用中的一个bug。您可以避免调用共享列表的方法,如:

from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
    # get the shared list
    shared_list = d[1]
    shared_list.append(4)
    # forces the shared list to 
    # be serialized back to manager
    d[1] = shared_list
    print d
if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
    print d
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
l=manager.list()
def f():
    l.append(4)
    d[1]=l
    print d
if __name__ == '__main__':
    d[1]=[]
    p = Process(target=f)
    p.start()
    p.join()

最新更新