Python多处理:全局变量和锁的需要



我做了一个多处理爬虫。下面是我的代码的简单结构:

class abc:
    def all(self):
        return "This is abc n what a abc!n"
class bcd:
    def all(self):
        return "This is bcd n what a bcd!n"
class cde:
    def all(self):
        return "This is cde n what a cde!n"
class ijk:
    def all(self):
        return "This is ijk n what a ijk!n"

def crawler(sites, ps_queue):
    for site in sites:
        ps_queue.put(site.all())
messages = ''
def message_collector(ps_queue):
    global messages
    while True:
        message = ps_queue.get()
        messages += message
        ps_queue.task_done()
def main():
    ps_queue = mp.JoinableQueue()
    message_collector_proc = mp.Process(
        target=message_collector,
        args=(ps_queue, )
    )
    message_collector_proc.daemon = True
    message_collector_proc.start()
    site_list = [abc(), bcd(), cde(), ijk(), abc()]
    crawler(site_list, ps_queue)
    ps_queue.join()
    print(messages)
if __name__ == "__main__":
    main()

有关这些代码的问题:1.在main()的末尾,有一个代码print(messages)但它没有打印出任何东西。为什么会这样?2.有些事情击中了我的头:消息可能会被搞砸,因为每个进程同时访问全局变量messages。需要lock吗?或者在这种情况下,每个进程按顺序访问messages

谢谢

当您调用mp.Process(target=message_collector, args=(ps_queue, ))打开一个新进程时,其中创建了一个新的消息对象,它会正确更新,但是当您尝试在主进程中打印它时,它是空的,因为它不是您向其添加数据的同一消息对象。 您需要另一个同步对象,以便可以在进程之间共享messages, 请参阅如何在 Python 中使用 Managers(( 在多个进程之间共享字符串?

下面是一个使用 Manager.Dict 对象的简单示例(不是最优雅或最有效的解决方案,但绝对是最简单的(

import multiprocessing as mp
class abc:
    def all(self):
        return "This is abc n what a abc!n"
class bcd:
    def all(self):
        return "This is bcd n what a bcd!n"
class cde:
    def all(self):
        return "This is cde n what a cde!n"
class ijk:
    def all(self):
        return "This is ijk n what a ijk!n"
manager = mp.Manager()
messages_dict = manager.dict()
def crawler(sites, ps_queue):
    for site in sites:
        ps_queue.put(site.all())
def message_collector(ps_queue):
    messages = ''
    while True:
        message = ps_queue.get()
        messages += message
        messages_dict['value'] = messages
        ps_queue.task_done()
def main():
    ps_queue = mp.JoinableQueue()
    message_collector_proc = mp.Process(
        target=message_collector,
        args=(ps_queue, )
    )
    message_collector_proc.daemon = True
    message_collector_proc.start()
    site_list = [abc(), bcd(), cde(), ijk(), abc()]
    crawler(site_list, ps_queue)
    ps_queue.join()
    print(messages_dict['value'])

相关内容

  • 没有找到相关文章

最新更新