通过通配符对某些规则进行链/依赖关系

我有一个特定的用例，我没有在Snakemake文档中找到解决方案。

假设在给定的管道中，我有一个包含 3 个规则 a、b 和 c 的部分，它们将运行 N 个样本。

这些规则处理大量数据，出于本地存储限制的原因，我不希望这些规则同时执行。例如，规则 a 生成大量数据，然后规则 c 压缩并导出结果。

因此，我正在寻找一种方法，将这 3 条规则链接为 1 个样本/通配符，然后才为下一个样本执行这 3 条规则。所有这些都是为了确保本地空间可用。

谢谢

我同意这是 Snakemake 仍然没有解决方案的问题。但是，您可能有一个解决方法。

rule all:
input: expand("a{sample}", sample=[1, 2, 3])
rule a:
input: "b{sample}"
output: "a{sample}"
rule b:
input: "c{sample}"
output: "b{sample}"
rule c:
input: 
lambda wildcards: f"a{wildcards.sample-1}"
output: "c{sample}"

这意味着，在示例1的规则a的输出准备就绪之前，示例2的规则c不会启动。不过，您需要添加一个伪输出a0或者使 lambda 更复杂。

因此，在德米特里·库兹米诺夫的答案的基础上，以下内容可以工作(将数字作为样本和字符串(。

执行顺序为 a3> b3> a1> b1> a2> b2。我使用了不同的样品订单来表明它可以与样品列表不同。

samples = [1, 2, 3]
sample_order = [3, 1, 2]
def get_previous(wildcards):
if wildcards.sample != sample_order[0]: # if different from a3 in this case
previous_sample = sample_order[sample_order.index(wildcards.sample) - 1]
return f'b_out_{previous_sample}'
else: # if is the first sample in the order i.e. a3
return #here put dummy file always present e.g. the file containing those rules or the Snakemake
rule all:
expand("b_out_{S}", S=sample)
rule a:
input:
"a_in_{sample}",
get_previous
output:
"a_out_{sample}"
rule b:
input:
"a_out_{sample}"
output:
"b_out_{sample}"

相关内容

最新更新

热门标签：