我能否以某种方式与子进程共享异步队列?



我想使用队列将数据从父进程传递到通过multiprocessing.Process启动的子进程。然而,由于父进程使用Python的新asyncio库,队列方法需要是非阻塞的。据我了解,asyncio.Queue是用于任务间通信的,不能用于进程间通信。此外,我知道multiprocessing.Queue具有put_nowait()get_nowait()方法,但我实际上需要仍然会阻塞当前任务的协程(但不是整个过程)。是否有一些方法来创建包裹put_nowait()/get_nowait()的协程?另一方面,multiprocessing.Queue内部使用的线程是否与在同一进程中运行的事件循环兼容?

如果没有,我还有什么其他选择?我知道我可以通过使用异步套接字来实现这样的队列,但我希望我可以避免…

编辑:

我也考虑过使用管道而不是套接字,但似乎asynciomultiprocessing.Pipe()不兼容。更准确地说,Pipe()返回Connection对象的元组,这些对象是而不是类文件对象。然而,asyncio.BaseEventLoop的方法add_reader()/add_writer()方法和connect_read_pipe()/connect_write_pipe()都期望类似文件的对象,因此不可能异步地读写这样的Connection。相比之下,subprocess包使用的通常的类文件对象作为管道完全没有问题,并且可以很容易地与asyncio结合使用。

更新:我决定进一步探索管道方法:我通过通过fileno()检索文件描述符并将其传递给os.fdopen(),将multiprocessing.Pipe()返回的Connection对象转换为类文件对象。最后,我将生成的类文件对象传递给事件循环的connect_read_pipe()/connect_write_pipe()。(如果有人对确切的代码感兴趣,可以在邮件列表中讨论相关问题。)然而,read()流给了我一个OSError: [Errno 9] Bad file descriptor,我没有设法解决这个问题。此外,考虑到缺少对Windows的支持,我将不再追究这一点。

这是一个可以与asyncio一起使用的multiprocessing.Queue对象的实现。它提供了整个multiprocessing.Queue接口,并添加了coro_getcoro_put方法,这两个asyncio.coroutine方法可用于异步地从队列中获取/放入队列。实现细节基本上与我的另一个答案的第二个例子相同:ThreadPoolExecutor用于使get/put异步,multiprocessing.managers.SyncManager.Queue用于在进程之间共享队列。唯一额外的技巧是实现__getstate__以保持对象可pickle,尽管使用不可pickle的ThreadPoolExecutor作为实例变量。

from multiprocessing import Manager, cpu_count
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
def AsyncProcessQueue(maxsize=0):
    m = Manager()
    q = m.Queue(maxsize=maxsize)
    return _ProcQueue(q)   
class _ProcQueue(object):
    def __init__(self, q):
        self._queue = q
        self._real_executor = None
        self._cancelled_join = False
    @property
    def _executor(self):
        if not self._real_executor:
            self._real_executor = ThreadPoolExecutor(max_workers=cpu_count())
        return self._real_executor
    def __getstate__(self):
        self_dict = self.__dict__
        self_dict['_real_executor'] = None
        return self_dict
    def __getattr__(self, name):
        if name in ['qsize', 'empty', 'full', 'put', 'put_nowait',
                    'get', 'get_nowait', 'close']:
            return getattr(self._queue, name)
        else:
            raise AttributeError("'%s' object has no attribute '%s'" % 
                                    (self.__class__.__name__, name))
    @asyncio.coroutine
    def coro_put(self, item):
        loop = asyncio.get_event_loop()
        return (yield from loop.run_in_executor(self._executor, self.put, item))
    @asyncio.coroutine    
    def coro_get(self):
        loop = asyncio.get_event_loop()
        return (yield from loop.run_in_executor(self._executor, self.get))
    def cancel_join_thread(self):
        self._cancelled_join = True
        self._queue.cancel_join_thread()
    def join_thread(self):
        self._queue.join_thread()
        if self._real_executor and not self._cancelled_join:
            self._real_executor.shutdown()
@asyncio.coroutine
def _do_coro_proc_work(q, stuff, stuff2):
    ok = stuff + stuff2
    print("Passing %s to parent" % ok)
    yield from q.coro_put(ok)  # Non-blocking
    item = q.get() # Can be used with the normal blocking API, too
    print("got %s back from parent" % item)
def do_coro_proc_work(q, stuff, stuff2):
    loop = asyncio.get_event_loop()
    loop.run_until_complete(_do_coro_proc_work(q, stuff, stuff2))
@asyncio.coroutine
def do_work(q):
    loop.run_in_executor(ProcessPoolExecutor(max_workers=1),
                         do_coro_proc_work, q, 1, 2)
    item = yield from q.coro_get()
    print("Got %s from worker" % item)
    item = item + 25
    q.put(item)
if __name__  == "__main__":
    q = AsyncProcessQueue()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(do_work(q))
输出:

Passing 3 to parent
Got 3 from worker
got 28 back from parent

可以看到,可以从父进程或子进程同步或异步地使用AsyncProcessQueue。它不需要任何全局状态,并且通过将大部分复杂性封装在一个类中,使用起来比我原来的答案更优雅。

你可能能够直接使用套接字获得更好的性能,但是以跨平台的方式工作似乎相当棘手。这还具有跨多个worker可用的优点,不需要您自己pickle/unpickle等。

不幸的是,multiprocessing库并不特别适合与asyncio一起使用。但是,这取决于您计划如何使用multiprocessing/multprocessing.Queue,您可以将其完全替换为concurrent.futures.ProcessPoolExecutor:

import asyncio
from concurrent.futures import ProcessPoolExecutor

def do_proc_work(stuff, stuff2):  # This runs in a separate process
    return stuff + stuff2
@asyncio.coroutine
def do_work():
    out = yield from loop.run_in_executor(ProcessPoolExecutor(max_workers=1),
                                          do_proc_work, 1, 2)
    print(out)
if __name__  == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(do_work())
输出:

3

如果你绝对需要一个multiprocessing.Queue,它似乎会表现得很好与ProcessPoolExecutor:

import asyncio
import time
import multiprocessing
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

def do_proc_work(q, stuff, stuff2):
    ok = stuff + stuff2
    time.sleep(5) # Artificial delay to show that it's running asynchronously
    print("putting output in queue")
    q.put(ok)
@asyncio.coroutine
def async_get(q):
    """ Calls q.get() in a separate Thread. 
    q.get is an I/O call, so it should release the GIL.
    Ideally there would be a real non-blocking I/O-based 
    Queue.get call that could be used as a coroutine instead 
    of this, but I don't think one exists.
    """
    return (yield from loop.run_in_executor(ThreadPoolExecutor(max_workers=1), 
                                           q.get))
@asyncio.coroutine
def do_work(q):
    loop.run_in_executor(ProcessPoolExecutor(max_workers=1),
                         do_proc_work, q, 1, 2)
    coro = async_get(q) # You could do yield from here; I'm not just to show that it's asynchronous
    print("Getting queue result asynchronously")
    print((yield from coro))
if __name__  == "__main__":
    m = multiprocessing.Manager()
    q = m.Queue() # The queue must be inherited by our worker, it can't be explicitly passed in
    loop = asyncio.get_event_loop()
    loop.run_until_complete(do_work(q))
输出:

Getting queue result asynchronously
putting output in queue
3

aiopipe (https://pypi.org/project/aiopipe/)似乎一针见血。

最新更新