如何访问嵌套列表中的每个元素,以便在snakemake中命名输出



这是一个类似的问题:Snakemake:使用python嵌套列表理解进行条件分析

我有以下内容:

RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]
rule all:
input:
summary = expand("foo/{run}/{sample}/outs/{sample}_report.html", run=RUN_ID, sample=SAMPLES)

问题1:RUN_ID中的每个运行应仅与SAMPLES中相应的样本关联(基于索引(。因此,run1仅与A、B、C配对run2只与D、E、F配对。

问题2:每个输出文件的命名应该反映这种基于索引的配对。目前,我正在努力使SAMPLES中每个嵌套列表的每个元素与每个RUN_ID配对

基于以上内容,我想要以下输出:

"foo/run1/A/outs/A_report.html"
"foo/run1/B/outs/B_report.html"
"foo/run1/C/outs/C_report.html"
"foo/run2/D/outs/D_report.html"
"foo/run2/E/outs/E_report.html"
"foo/run2/F/outs/F_report.html"

最初我得到的是:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run1/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"
"foo/run2/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

我在expand函数中使用zip克服了不希望的配对:

summary= expand(["foo/{run}/{sample}/outs/{sample}_report.html", "foo/{run}/{sample}/outs/{sample}_report.html"], zip, run=RUN_ID, sample=SAMPLES)

留给我RUN_IDSAMPLES之间所需的配对:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

但是,如上所述,每个嵌套列表都被传递到输出路径,而不是每个嵌套列表的每个元素。我可以通过将SAMPLES分为两个不同的列表来实现我想要的,但我想要一种更优雅和自动化的方法。

我也不拘泥于使用嵌套列表;感谢对解决方案或更好方法的任何见解。谢谢

expand是一个方便的实用程序,对于更复杂的情况,直接使用python:生成所需列表通常会更快

RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]
desired_files = []
for run, SAMPLE in zip(RUN_ID, SAMPLES):
for sample in SAMPLE:
file = f"foo/{run}/{sample}/outs/{sample}_report.html"
desired_files.append(file)

rule all:
input: desired_files

最新更新