用于分析的Snakemake规则，其中为diff参数生成一个结果文件，并且参数来自另一个规则输出内容

我有以下基本的蛇制作设置：

rule step1:
"""
The output will contain a list of GENEs in a txt file.
"""
input: "input1.txt"
output: "output1.txt"
shell:
"""
analysis1.R {input} {output}
"""
rule step2:
"""
Analysis step2.
"""
input: "input2.txt"
output: "output2.txt"
shell:
"""
analysis2.py {input} {output}
"""
rule step3:
"""
GENE should be coming from the step1 output file, with a GENE name on each
line.
"""
input: rules.step2.output
output: "output3-GENE.txt"
shell:
"""
analysis3.py -i {input} -o {output} -p GENE
"""

我在步骤1中为步骤3生成了一个包含基因(参数(列表的文件，在步骤2中生成了另一个文件。我想做的是运行步骤3的次数与我在output1.txt中的行数一样多，其中行的内容是步骤3的参数，它也应该是输出文件名的一部分，但我无法理解它。有什么想法吗？谢谢你的帮助！

您可以使用检查点。

如果您知道step3文件将生成的文件列表，则可以定义aggregate规则：

rule aggregate:
input:
# List of files that step3 needs to generate

这将允许您根据需要多次运行rule step3。

棘手的部分是定义这些文件的列表。这应是rule step1:结果的函数

def aggregate_input(wildcards):
with checkpoints.rile1.get().output[0].open() as f:
return f.readlines()
rule aggregate:
input:
aggregate_input

在这种情况下，rule step1应成为检查点：

checkpoint step1:
"""
The output will contain a list of GENEs in a txt file.
"""
input: "input1.txt"
output: "output1.txt"
shell:
"""
analysis1.R {input} {output}
"""

在我的例子中，我简化了函数aggregate_input，只返回step1的输出行。如果你需要一个更复杂的功能，你可以自己设计一个。

相关内容

最新更新

热门标签：