如何按顺序启动进程



>我有一个简单的代码来准备和运行进程:

with Pool(processes=4) as pool:
        pool.map(check_url, range(0, 240000)

这对于验证网站上是否存在页面是必要的,例如 site.com/298 - 存在,site.com/17 - 不存在。所以我需要检查240,000页。问题是当你运行一个脚本时,range(( 给出的值不是按顺序排列的,即我在输出中看到:

Page found: 26545
Page not found: 1523
Page found: 45
Page found: 9
Page found: 4568
Page not found: 256
....

我尝试使用准备好的列表而不是范围:

urls = [i for i in range(0, 240000)]

当我打印出来时,我看到一个按顺序排列的数字列表,但过程仍然继续无序地开始。如何使进程按顺序运行?

UPD:我的解决方案可以检查同一页面两次或更多次吗?

Pool.map 的全部意义在于分离任务并让它们单独执行(https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map(。如果要按顺序馈送数据,则需要按顺序发送数据,即:

import multiprocessing as mp
from time import sleep
import random
def f(x):
    worker_name = mp.current_process().name
    print(f"[{x}] by {worker_name}")#start
    timetosleep=random.randrange(10)/10
    sleep(timetosleep) 
    print(f"-[{x}] by {worker_name}")#done
    return x
if __name__ == '__main__':
    print("Init")
    with mp.Pool(processes=16) as p:
        for i in range(10):
            p.apply_async(f, (i,))
        p.close()
        p.join()
    print("Done")

给出输出:

Init
[0] by SpawnPoolWorker-4
[1] by SpawnPoolWorker-2
[2] by SpawnPoolWorker-1
[3] by SpawnPoolWorker-3
[4] by SpawnPoolWorker-5
[5] by SpawnPoolWorker-6
[6] by SpawnPoolWorker-7
[7] by SpawnPoolWorker-8
[8] by SpawnPoolWorker-10
-[7] by SpawnPoolWorker-8
[9] by SpawnPoolWorker-8
-[5] by SpawnPoolWorker-6
-[2] by SpawnPoolWorker-1
-[0] by SpawnPoolWorker-4
-[9] by SpawnPoolWorker-8
-[4] by SpawnPoolWorker-5
-[8] by SpawnPoolWorker-10
-[6] by SpawnPoolWorker-7
-[3] by SpawnPoolWorker-3
-[1] by SpawnPoolWorker-2
Done

如您所见,这些流程是按顺序启动的,但每个流程都需要不同的时间才能完成。如果您需要按顺序完成,则不是一种选择,因为您无法保证这一点。

从 Pool 上的 Python 文档中,您可以看到 'map' 的签名:

 map(func, iterable[, chunksize])
    A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.
    This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.

您的作业是并行提交的,这意味着它们不能保证按顺序执行。如果需要按顺序轮询站点,则并行化可能不是最佳的,您可以考虑使用 for 循环来保证顺序行为。

相关内容

  • 没有找到相关文章

最新更新