我有一个蛇形管道,看起来像这样:
configfile: "./config.yaml"
IN_DIR = config["in_dir"]
SAMPLES = config["samples"]
rule all:
input:
expand("{sample}_Aligned.sortedByCoord.out.bam", sample=SAMPLES)
rule trimmomatic_pe:
message:
"""
Pre-processing raw reads with trimmomatic. Trimming low quality reads and adapter sequences. Running QC on trimmed reads.
"""
input:
r1 = expand("{in_dir}/{{sample}}_R1_001.fastq.gz", in_dir=IN_DIR),
r2 = expand("{in_dir}/{{sample}}_R2_001.fastq.gz", in_dir=IN_DIR)
params:
trimmer = config["parameters"]["trim"],
extra = ""
output:
r1 = "tmp/{sample}_R1_trimmed.fastq.gz",
r2 = "tmp/{sample}_R2_trimmed.fastq.gz",
r1_unpaired = "tmp/{sample}_R1_unpaired_trimmed.fastq.gz",
r2_unpaired = "tmp/{sample}_R2_unpaired_trimmed.fastq.gz"
threads:
2
wrapper:
"0.74.0/bio/trimmomatic/pe"
rule map_reads:
message:
"""
Mapping trimmed reads to host genome
"""
input:
r1 = "tmp/{sample}_R1_trimmed.fastq.gz",
r2 = "tmp/{sample}_R2_trimmed.fastq.gz"
params:
annotation = config["annotation_file"]
output:
"{sample}_Aligned.sortedByCoord.out.bam"
shell:
"""
STAR
--runThreadN 16
--sjdbGTFfile {params.annotation}
--sjdbOverhang 149
--outFilterType BySJout
--outFilterMultimapNmax 10
--alignSJoverhangMin 5
--alignSJDBoverhangMin 1
--outFilterMismatchNmax 999
--outFilterMismatchNoverReadLmax 0.04
--alignIntronMin 20
--alignIntronMax 1000000
--alignMatesGapMax 1000000
--outFilterIntronMotifs RemoveNoncanonicalUnannotated
--outFileNamePrefix {wildcards.sample}_
--outSAMtype BAM SortedByCoordinate
--runMode alignReads
--genomeDir ./index
--readFilesIn {input.r1} {input.r2}
"""
当我运行snakemake -np
时,DAG是正确的,但当我尝试使用snakemake --cores 2
:实际运行管道时,我一直收到这个错误,我不知道如何解释
[Thu Apr 29 15:12:21 2021]
Job 1:
Pre-processing raw reads with trimmomatic. Trimming low quality reads and adapter sequences. Running QC on trimmed reads.
Traceback (most recent call last):
File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 88, in <module>
input_files, output_files, snakemake.threads
File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 27, in distribute_threads
gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 27, in <genexpr>
gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
AttributeError: 'Namedlist' object has no attribute 'endswith'
[Thu Apr 29 15:12:22 2021]
Error in rule trimmomatic_pe:
jobid: 1
output: tmp/4_12hr_Ciliated_4_S4_R1_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R2_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R1_unpaired_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R2_unpaired_trimmed.fastq.gz
RuleException:
CalledProcessError in line 29 of /Users/user/Documents/postdoc_projects/invert/Snakefile:
Command 'set -euo pipefail; /Users/user/opt/miniconda3/envs/invert/bin/python3.6 /Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py' returned non-zero exit status 1.
File "/Users/user/Documents/postdoc_projects/invert/Snakefile", line 29, in __rule_trimmomatic_pe
File "/Users/user/opt/miniconda3/envs/invert/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
管道没有正确识别我的样本吗?属性错误看起来是这样的,但这是我的配置文件结构:
in_dir: data #Directory containing raw fastq files from RNAseq
samples: "4_12hr_Ciliated_4_S4" #sample name prefix
annotation_file: ref_files/Homo_sapiens.GRCh38.103.gtf #Directory containing the viral host genome annotation in .gtf format
parameters:
trim: ["TRAILING:3 ILLUMINACLIP:ref_files/TruSeq3-PE-2.fa"] #trimmomatic parameters
我错过了什么?
expand
函数返回一个列表。通过将输入文件设置为列表而不是字符串,您会混淆脚本。对于定义r1和r2,应该使用返回字符串的东西。我建议使用字符串的format()
函数或f字符串。
更改:
r1 = expand("{in_dir}/{{sample}}_R1_001.fastq.gz", in_dir=IN_DIR),
至:
r1 = "{in_dir}/{{sample}}_R1_001.fastq.gz".format(in_dir=IN_DIR),
甚至:
r1 = f"{IN_DIR}/{{sample}}_R1_001.fastq.gz",
并对r2 执行相同操作