根据config.yaml中的变量(而非通配符)命名snakemake中的输入/输出文件



我正在尝试编辑和运行一个snakemake管道。简而言之,snakemakepipeline调用默认的基因组比对器(minimap(并生成具有此名称的输出文件。我正在尝试将变量aligner添加到config.yaml,以指定要调用的对齐器。此外(在我实际被卡住的地方(,输出文件应该具有config.yaml中指定的对齐器的名称。

我的config.yaml看起来像这样:

# this config.yaml is passed to Snakefile in pipeline-structural-variation subfolder.
# Snakemake is run from this pipeline-structural-variation folder; it is necessary to
# pass an appropriate path to the input-files (the ../ prefix is sufficient for this demo)
aligner: "ngmlr" # THIS IS THE VARIABLE I AM ADDING TO THIS FILE. VALUES COULD BE minimap or ngmlr
# FASTQ file or folder containing FASTQ files
# check if this has to be gzipped
input_fastq: "/nexusb/Gridion/20190917PGD2staal2/PD170815/PD170815_cat_all.fastq.gz" # original is ../RawData/GM24385_nf7_chr20_af.fastq.gz
# FASTA file containing the reference genome
# note that the original reference sequence contains only the sequence of chr20
reference_fasta: "/nexus/bhinckel/19/ONT_projects/PGD_breakpoint/ref_hg19_local/hg19_chr1-y.fasta" # original is ../ReferenceData/human_g1k_v37_chr20_50M.fasta
# Minimum SV length
min_sv_length: 300000 # original value was 40
# Maximum SV length
max_sv_length: 1000000 # original value was 1000000. Note that the value I used to run the pipeline for the sample PD170677 was 100000000000, which will be coerced to NA in the R script (/home/bhinckel/ont_tutorial_sv/ont_tutorial_sv.R)
# Min read length. Shorter reads will be discarded
min_read_length: 1000
# Min mapping quality. Reads will lower mapping quality will be discarded
min_read_mapping_quality: 20
# Minimum read support required to call a SV (auto for auto-detect)
min_read_support: 'auto'
# Sample name
sample_name: "PD170815" # original value was GM24385.nf7.chr20_af. Note that this can be a list

我在我的snakefile的下面发布了生成扩展名为_minimap2.bam的输出文件的部分,我想用_minimap2.bam_ngmlr.bam替换这些文件,具体取决于config.yaml上的aligner

# INPUT BAM folder
bam = None
if "bam" in config:
bam = os.path.join(CONFDIR, config["bam"])
# INPUT FASTQ folder
FQ_INPUT_DIRECTORY = []
if not bam:
if not "input_fastq" in config:
print(""input_fastq" not specified in config file. Exiting...")
FQ_INPUT_DIRECTORY = os.path.join(CONFDIR, config["input_fastq"])
if not os.path.exists(FQ_INPUT_DIRECTORY):
print("Could not find {}".format(FQ_INPUT_DIRECTORY))
MAPPED_BAM = "{sample}/alignment/{sample}_minimap2.bam" # Original
#MAPPED_BAM = "{sample}/alignment/{sample}_{alignerName}.bam" # this did not work
#MAPPED_BAM = f"{sample}/alignment/{sample}_{config['aligner']}.bam" # this did nor work either
else:
MAPPED_BAM = find_file_in_folder(bam, "*.bam", single=True)
...
if config['aligner'] == 'minimap':
rule index_minimap2:
input:
REF = FA_REF
output:
"{sample}/index/minimap2.idx"
threads: config['threads']
conda: "env.yml"
shell:
"minimap2 -t {threads} -ax map-ont --MD -Y {input.REF} -d {output}"

rule map_minimap2:
input:
FQ = FQ_INPUT_DIRECTORY,
IDX = rules.index_minimap2.output,
SETUP = "init"
output:
BAM = "{sample}/alignment/{sample}_minimap2.bam",
BAI = "{sample}/alignment/{sample}_minimap2.bam.bai"
conda: "env.yml"
threads: config["threads"]
shell:
"cat_fastq {input.FQ} | minimap2 -t {threads} -K 500M -ax map-ont --MD -Y {input.IDX} - | samtools sort -@ {threads} -O BAM -o {output.BAM} - && samtools index -@ {threads} {output.BAM}"
else:
print(f"Aligner is {config['aligner']} - skipping indexing step for minimap2")
rule map_ngmlr:
input:
REF = FA_REF,
FQ = FQ_INPUT_DIRECTORY,
SETUP = "init"
output:
BAM = "{sample}/alignment/{sample}_minimap2.bam",
BAI = "{sample}/alignment/{sample}_minimap2.bam.bai"
conda: "env.yml"
threads: config["threads"]
shell:
"cat_fastq {input.FQ} | ngmlr -r {input.REF} -t {threads} -x ont - | samtools sort -@ {threads} -O BAM -o {output.BAM} - && samtools index -@ {threads} {output.BAM}"

我最初尝试创建一个alignerName参数,类似于sample参数,如下所示:

# Parameter: sample_name
sample = "sv_sample01"
if "sample_name" in config:
sample = config['sample_name']
###############
#
# code below created by me
#
###############
# Parameter: aligner_name
alignerName = "defaultAligner"
if "aligner" in config:
alignerName = config['aligner']

然后,我尝试在输入/输出文件中有minimap2的地方输入{alignerName}(请参阅上面注释的MAPPED_BAM变量定义(,尽管这会引发错误。我想snakemake会将{alignerName}解释为通配符,尽管我想要的只是将config['aligner']中定义的变量名传递给输入/输出文件。我也尝试过f-string(MAPPED_BAM = f"{sample}/alignment/{sample}_{config['aligner']}.bam"(,不过我想它也不起作用。

你很接近!

通配符在snakemake中的工作方式是,它们被解释为"最后一个",而f字符串被解释为第一个。要不解释f字符串中的大括号,可以用另一个大括号转义,如下所示:

print(f"{{keep curly}}")
>>> {keep curly}

所以我们需要做的就是

MAPPED_BAM = f"{{sample}}/alignment/{{sample}}_{config['aligner']}.bam"

最新更新