下面是一个简单的多处理代码:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
d[1].append(4)
print d
if __name__ == '__main__':
d[1] = []
p = Process(target=f)
p.start()
p.join()
我得到的输出是:
{1: []}
为什么我不能得到{1: [4]}
作为输出?
你写的是:
# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager
# every process creates own 'manager' and 'd'
manager = Manager()
# BTW, Manager is also child process, and
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()
def f():
# 'd' - is that 'd', that is defined in globals in this, current process
d[1].append(4)
print d
if __name__ == '__main__':
# from here code executes ONLY in main process
d[1] = []
p = Process(target=f)
p.start()
p.join()
你应该这样写:
from multiprocessing import Process, Manager
def f(d):
d[1] = d[1] + [4]
print d
if __name__ == '__main__':
manager = Manager() # create only 1 mgr
d = manager.dict() # create only 1 dict
d[1] = []
p = Process(target=f,args=(d,)) # say to 'f', in which 'd' it should append
p.start()
p.join()
不打印附加到d[1]
的新项的原因在Python的官方文档中说明:
对dict和list代理中的可变值或项的修改将会不能通过管理器传播,因为代理没有办法知道它的值或项何时被修改。修改这样一个项目,您可以将修改后的对象重新分配给容器代理。
因此,实际情况是这样的:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
# invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
# (consider that a KeyError exception wasn't raised, so a list was definitely returned),
# and append 4 to it, however this change is not propagated through the manager,
# as it's performed on an ordinary list with which the manager has no interaction
d[1].append(4)
# convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
# returning the "remote" string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
# to which the change above was not propagated
print d
if __name__ == '__main__':
# invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
d[1] = []
p = Process(target=f)
p.start()
p.join()
用新列表重新分配d[1]
,甚至在更新后再次使用相同的列表,触发管理器传播更改:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
# perform the exact same steps, as explained in the comments to the previous code snippet above,
# but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
l = d[1]
l.append(4)
d[1] = l
print d
if __name__ == '__main__':
d[1] = []
p = Process(target=f)
p.start()
p.join()
d[1] += [4]
行也可以。
EDIT for Python 3.6或更高版本:
从Python 3.6开始,根据这个问题之后的更改集,也可以使用嵌套代理对象,它会自动将对它们执行的任何更改传播到包含代理对象。因此,将d[1] = []
行替换为d[1] = manager.list()
也可以纠正这个问题:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
d[1].append(4)
# the __str__() method of a dict object invokes __repr__() on each of its items,
# so explicitly invoking __str__() is required in order to print the actual list items
print({k: str(v) for k, v in d.items()})
if __name__ == '__main__':
d[1] = manager.list()
p = Process(target=f)
p.start()
p.join()
不幸的是,此错误修复未移植到Python 2.7(从Python 2.7.13开始)。
说明(在Windows操作系统下):
虽然所描述的行为也适用于Windows操作系统,但由于不同的进程创建机制,附加的代码片段在Windows下执行时将失败,这依赖于CreateProcess()
API而不是fork()
系统调用,这是不支持的。
每当通过multiprocessing模块创建一个新进程时,Windows创建一个新的Python解释器进程,该进程导入主模块,具有潜在的危险副作用。为了避免这个问题,建议使用以下编程指南:
确保主模块可以被新的Python解释器安全地导入,而不会产生意想不到的副作用(例如启动一个新进程)。
因此,按照Windows下的方式执行附加的代码片段将尝试创建无限数量的进程,因为manager = Manager()
行。通过在if __name__ == '__main__'
子句中创建Manager
和Manager.dict
对象,并将Manager.dict
对象作为参数传递给f()
,可以很容易地解决这个问题,就像这个答案中所做的那样。
关于这个问题的更多细节可以在这个回答中找到。
我认为这是管理器代理调用中的一个bug。您可以避免调用共享列表的方法,如:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
def f():
# get the shared list
shared_list = d[1]
shared_list.append(4)
# forces the shared list to
# be serialized back to manager
d[1] = shared_list
print d
if __name__ == '__main__':
d[1] = []
p = Process(target=f)
p.start()
p.join()
print d
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
l=manager.list()
def f():
l.append(4)
d[1]=l
print d
if __name__ == '__main__':
d[1]=[]
p = Process(target=f)
p.start()
p.join()