处理图并行计算中相互依赖的文件



我正试图通过使用dask.delayed创建任务图(或自己实现计算图(来并行化以下代码(MCVE(:

os.chdir('./kitchen1')
write_dough()   # writes file ./dough
write_topping() # writes file ./topping
write_pizza()   # requires ./dough and ./topping; writes ./pizza

我看到两个困难:

  1. write_dough不返回任何内容。CCD_ 3使得变量之间的依赖关系清晰;这不是。Dask不建议依赖副作用。有惯用的解决方案吗
  2. os.chdir。如何将其合并到计算图中
  3. 我不关心并行化文件IO、性能等

这是我目前的解决方案。它增加了复杂性,'./kitchen1'无处不在,这很丑陋。什么是优雅的解决方案?

write_dough, write_topping, write_pizza = map(dask.delayed, (write_dough, write_topping, write_pizza))
dough = write_dough('./kitchen1')
topping = write_topping('./kitchen1')
pizza = write_pizza(dough, topping, './kitchen1')

我推荐您当前显式传递依赖关系的方法。

最新更新