Snakemaker无法识别多个文件作为输入

我在运行snakemake时遇到一些问题。我想使用FastQC对一些RNA-Seq批量样品进行质量控制。我编写代码的方式是，所有遵循模式{sample}_{replicate}.fastq.gz的文件都应该用作输入，其中{sample}是样本id(即SRR6974023(，{replicate}是1或2。我的小脚本如下：

configfile: "config.yaml"
rule all:
input:
expand("raw_qc/{sample}_{replicate}_fastqc.{extension}", sample=config["samples"], replicate=[1, 2], extension=["zip", "html"])

rule fastqc:
input:
rawread=expand("raw_data/{sample}_{replicate}.fastq.gz", sample=config["samples"], replicate=[1, 2])

output:
compress=expand("raw_qc/{sample}_{replicate}_fastqc.zip", sample=config["samples"], replicate=[1, 2]),
net=expand("raw_qc/{sample}_{replicate}_fastqc.html", sample=config["samples"], replicate=[1, 2])

threads: 
8

params:
path="raw_qc/"

shell:
"fastqc -t {threads} {input.rawread} -o {params.path}"

只是这样，config.yaml是：

samples:
SRR6974023
SRR6974024

包含我的文件的raw_data目录如下所示：

SRR6974023_1.fastq.gz  SRR6974023_2.fastq.gz  SRR6974024_1.fastq.gz  SRR6974024_2.fastq.gz

最后，当我运行脚本时，我总是看到相同的错误：

Building DAG of jobs...
MissingInputException in line 8 of /home/user/path/Snakefile:
Missing input files for rule fastqc:
raw_data/SRR6974023 SRR6974024_2.fastq.gz
raw_data/SRR6974023 SRR6974024_1.fastq.gz

它只能正确地看到最后一个文件，在本例中是SRR6974024_1.fastq.gz和SRR6974024_2.fastq.gz。不管怎样，另一个只被视为SRR6974023。我该如何解决这个问题？我很感激你的帮助。谢谢大家！

yaml配置不正确。它应该有-来将每一行变成一个列表：

samples:
- SRR6974023
- SRR6974024

相关内容

最新更新

热门标签：