我正在使用GitHub的AlphaPose,我想从我在AlphaPose根中创建的另一个脚本run.py运行脚本script/demo_inference.py。在run.py中,我使用以下脚本将demo_inference.py导入为ap:
def import_module_by_path(path):
name = os.path.splitext(os.path.basename(path))[0] spec =
importlib.util.spec_from_file_location(name, path) mod =
importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) return mod
和
ap = import_module_by_path('./scripts/demo_inference.py')
然后,在demo_inference.py中我替换了
if __name__ == "__main__":
def startAlphapose():
在run.py中我写了
ap.StartAlphapose().
现在我得到了这个错误:
Load SE Resnet...
Loading YOLO model..
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 251, in image_postprocess
(orig_img, im_name, boxes, scores, ids, inps, cropped_boxes) = self.wait_and_get(self.det_queue)
File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 121, in wait_and_get
return queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/vislab/guerri/alphagastnet/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
这是什么意思?
我们的集群也遇到了同样的问题。
在PyTorch中使用multiprocessing(通常运行多个DataLoader worker)时,子进程在/tmp
目录中创建套接字以相互通信。这些套接字都保存在名为pymp-######
的文件夹中,看起来像0字节文件。在PyTorch脚本仍在运行时删除这些文件或文件夹将导致上述错误。
在我们的情况下,问题是一个错误的维护脚本,擦除文件的/tmp
文件夹,而他们仍然需要。可能还有其他方法触发此错误。但是你应该从寻找这些套接字开始,并确保它们不会被意外擦除。
如果这不能解决问题,请在错误发生的确切时间查看您的/var/log/syslog
文件。你很可能在那里找到它的原因。