我正在尝试使用并行python来进行一些分布式基准测试(本质上是在中央服务器的一组机器上协调和运行一些代码)。我的代码运行得非常好,直到我将功能转移到一个单独的包中。从那时起,我一直得到ImportError: No module named some.module.pp_test
。
我的问题实际上有两个:有人遇到过pp
的这个问题吗?如果有,如何解决?我尝试使用dill
(import dill
),但没有帮助。此外,是否有一个很好的替代品来代替并行Python,它不需要任何额外的基础设施?
我得到的确切错误是:
RUNNING TEST
Waiting for hosts to finish booting....A fatal error has occured during the function execution
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/ppworker.py", line 86, in run
__args = pickle.loads(__sargs)
ImportError: No module named some.module.pp_test
Caught exception in the run phase 'NoneType' object is not iterable
Traceback (most recent call last):
File "test.py", line 5, in <module>
p.ping_pong()
File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 5, in ping_pong
a_test.run()
File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 27, in run
pong, hostname = ping()
TypeError: 'NoneType' object is not iterable
代码的结构是这样的:
pp-test/
test.py
some/
__init__.py
module/
__init__.py
pp_test.py
test.py
实现为:
from some.module.pp_test import MWE
p = MWE()
p.ping_pong()
而pp_test.py
是:
class MWE():
def ping_pong(self):
print "RUNNING TEST "
a_test = PPTester()
a_test.run()
import pp
import time
from sys import stdout, exit
class PPTester(object):
def run(self):
try:
ppservers = ('10.10.10.10', )
time.sleep(5)
job_server = pp.Server(0, ppservers=ppservers)
stdout.write("Waiting for hosts to finish booting...")
while len(job_server.get_active_nodes()) - 1 < len(ppservers):
stdout.write(".")
stdout.flush()
time.sleep(1)
ppmodules = ()
pings = [(server, job_server.submit(self.run_pong, modules=ppmodules)) for server in ppservers]
for server, ping in pings:
pong, hostname = ping()
print "Host ", hostname, " is alive!"
print "All servers booted up, starting benchmarks..."
job_server.print_stats()
except Exception as e:
print "Caught exception in the run phase", e
raise
pass
def run_pong(self):
import subprocess
p = subprocess.Popen("hostname", stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
p_status = p.wait()
return "pong ", output
dill
无法开箱即用地使用pp
,因为pp
不会序列化python对象——pp
提取对象的源代码(就像标准python库中的inspect
模块)。
要使pp
能够使用dill
(实际上是dill.source
,它是由dill
扩充的inspect
),您必须使用pp
的一个名为ppft
的分支。ppft
安装为pp
(即使用import pp
导入),但它具有更强的源代码检查功能,因此您可以自动"序列化"大多数python对象,并让ppft
自动跟踪它们的依赖关系。
在此处获取ppft
:https://github.com/uqfoundation
ppft
还可安装pip
,并兼容python 3.x
。