Nextflow是否真的不一致,或者我使用nf-core/rnaseq做错了什么?



我想先说我是Nextflow的新手,如果我不包括调试键,我很抱歉,请告诉我。

====================================

案例1:我试着运行这个命令:

nextflow run nf-core/rnaseq --aligner histat2 -profile test,docker

但是最终得到这个错误:

-[nf-core/rnaseq] Pipeline completed with errors-
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'NFCORE_RNASEQ:RNASEQ:MULTIQC_CUSTOM_BIOTYPE (RAP1_UNINDUCED_REP2)'
Caused by:
Process `NFCORE_RNASEQ:RNASEQ:MULTIQC_CUSTOM_BIOTYPE (RAP1_UNINDUCED_REP2)` terminated with an error exit status (1)
Command executed:
cut -f 1,7 RAP1_UNINDUCED_REP2.featureCounts.txt | tail -n +3 | cat biotypes_header.txt - >> RAP1_UNINDUCED_REP2.biotype_counts_mqc.tsv
mqc_features_stat.py RAP1_UNINDUCED_REP2.biotype_counts_mqc.tsv -s RAP1_UNINDUCED_REP2 -f rRNA -o RAP1_UNINDUCED_REP2.biotype_counts_rrna_mqc.tsv
Command exit status:
1
Command output:
(empty)
Command error:
cut: RAP1_UNINDUCED_REP2.featureCounts.txt: No such file or directory
cat: can't open 'biotypes_header.txt': No such file or directory
Work dir:
/mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/work/e7/5df55125d9662b3c6ee83cdeea9ea9
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

我去"工作"了。目录告诉我,并运行广告bash .command.run,它工作得很好!为什么会出错?

==========================================

案例2:

考虑到我的问题是Docker,我也使用了Singularity。我按如下方式运行它,导致两次失败,一次成功。下面是命令和错误:

  1. nextflow run nf-core/rnaseq --aligner hisat2 -profile test,singularity

Caused by:
Failed to pull singularity image
command: singularity pull  --name depot.galaxyproject.org-singularity-qualimap-2.2.2d--1.img.pulling.1631227989457 https://depot.galaxyproject.org/singularity/qualima
p:2.2.2d--1 > /dev/null
status : 255
message:
INFO:    Downloading network image
INFO:    Cleaning up incomplete download: /home/mkozubov/.singularity/cache/net/tmp_601246724
FATAL:   unexpected EOF
  1. nextflow run nf-core/rnaseq --aligner hisat2 -profile test,singularity -resume

Caused by:
Failed to pull singularity image
command: singularity pull  --name depot.galaxyproject.org-singularity-bioconductor-dupradar-1.18.0--r40_1.img.pulling.1631228803940 https://depot.galaxyproject.org/si
ngularity/bioconductor-dupradar:1.18.0--r40_1 > /dev/null
status : 255
message:
INFO:    Downloading network image
INFO:    Cleaning up incomplete download: /home/mkozubov/.singularity/cache/net/tmp_504979312
FATAL:   unexpected EOF
  1. nextflow run nf-core/rnaseq --aligner hisat2 -profile test,singularity -resume ecstatic_minsky
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 09-Sep-2021 16:40:43
Duration    : 26m 44s
CPU hours   : 1.5 (29.8% cached)
Succeeded   : 116
Cached      : 64

我意识到我的第二份简历可能什么都没做,但是为什么恢复我的第一次运行会解决任何问题呢?为什么奇点不能第一次拉下它需要的图像?我是一个新手,我真的不知道从哪里开始调试这样一个问题,任何帮助将是非常感激的。

===========================================================

配置文件:

========================================================================================
nf-core/rnaseq Nextflow config file
========================================================================================
Default config options for all compute environments
----------------------------------------------------------------------------------------
*/
// Global default params, used in configs
params {
// Input options
input                      = null
// References
genome                     = null
transcript_fasta           = null
additional_fasta           = null
splicesites                = null
gtf_extra_attributes       = 'gene_name'
gtf_group_features         = 'gene_id'
featurecounts_feature_type = 'exon'
featurecounts_group_type   = 'gene_biotype'
gencode                    = false
save_reference             = false
// UMI handling
with_umi                   = false
umitools_extract_method    = 'string'
umitools_bc_pattern        = null
save_umi_intermeds         = false
// Trimming
clip_r1                    = null
clip_r2                    = null
three_prime_clip_r1        = null
three_prime_clip_r2        = null
trim_nextseq               = null
save_trimmed               = false
skip_trimming              = false
// Ribosomal RNA removal
remove_ribo_rna            = false
save_non_ribo_reads        = false
ribo_database_manifest     = "${projectDir}/assets/rrna-db-defaults.txt"
// Alignment
aligner                    = 'star_salmon'
pseudo_aligner             = null
seq_center                 = null
bam_csi_index              = false
star_ignore_sjdbgtf        = false
salmon_quant_libtype       = null
hisat2_build_memory        = '200.GB'  // Amount of memory required to build HISAT2 index with splice sites
stringtie_ignore_gtf       = false
min_mapped_reads           = 5
save_merged_fastq          = false
save_unaligned             = false
save_align_intermeds       = false
skip_markduplicates        = false
skip_alignment             = false
// QC
skip_qc                    = false
skip_bigwig                = false
skip_stringtie             = false
skip_fastqc                = false
skip_preseq                = false
skip_dupradar              = false
skip_qualimap              = false
skip_rseqc                 = false
skip_biotype_qc            = false
skip_deseq2_qc             = false
skip_multiqc               = false
deseq2_vst                 = false
rseqc_modules              = 'bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication'
// Boilerplate options
outdir                     = './results'
publish_dir_mode           = 'copy'
multiqc_config             = null
multiqc_title              = null
email                      = null
email_on_fail              = null
max_multiqc_email_size     = '25.MB'
plaintext_email            = false
monochrome_logs            = false
help                       = false
igenomes_base              = 's3://ngi-igenomes/igenomes'
tracedir                   = "${params.outdir}/pipeline_info"
igenomes_ignore            = false
validate_params            = true
show_hidden_params         = false
schema_ignore_params       = 'genomes,modules'
enable_conda               = false
singularity_pull_docker_container = false
// Config options
custom_config_version      = 'master'
custom_config_base         = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
hostnames                  = [:]
config_profile_description = null
config_profile_contact     = null
config_profile_url         = null
config_profile_name        = null
// Max resource options
// Defaults only, expecting to be overwritten
max_memory                 = '128.GB'
max_cpus                   = 16
max_time                   = '240.h'
}
// Load base.config by default for all pipelines
includeConfig 'conf/base.config'
// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'
// Load nf-core custom profiles from different Institutions
try {
includeConfig "${params.custom_config_base}/nfcore_custom.config"
} catch (Exception e) {
System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config")
}
// Load nf-core/rnaseq custom config
try {
includeConfig "${params.custom_config_base}/pipeline/rnaseq.config"
} catch (Exception e) {
System.err.println("WARNING: Could not load nf-core/config/rnaseq profiles: ${params.custom_config_base}/pipeline/rnaseq.config")
}
// Load igenomes.config if required
if (!params.igenomes_ignore) {
includeConfig 'conf/igenomes.config'
} else {
params.genomes = [:]
}
profiles {
debug { process.beforeScript = 'echo $HOSTNAME' }
conda {
params.enable_conda    = true
docker.enabled         = false
singularity.enabled    = false
podman.enabled         = false
shifter.enabled        = false
charliecloud.enabled   = false
}
docker {
docker.enabled         = true
docker.userEmulation   = true
singularity.enabled    = false
podman.enabled         = false
shifter.enabled        = false
charliecloud.enabled   = false
}
singularity {
singularity.enabled    = true
singularity.autoMounts = true
docker.enabled         = false
podman.enabled         = false
shifter.enabled        = false
charliecloud.enabled   = false
}
podman {
podman.enabled         = true
docker.enabled         = false
singularity.enabled    = false
shifter.enabled        = false
charliecloud.enabled   = false
}
shifter {
shifter.enabled        = true
docker.enabled         = false
singularity.enabled    = false
podman.enabled         = false
charliecloud.enabled   = false
}
charliecloud {
charliecloud.enabled   = true
docker.enabled         = false
singularity.enabled    = false
podman.enabled         = false
shifter.enabled        = false
}
test      { includeConfig 'conf/test.config'      }
test_full { includeConfig 'conf/test_full.config' }
}
// Export these variables to prevent local Python/R libraries from conflicting with those in the container
env {
PYTHONNOUSERSITE = 1
R_PROFILE_USER   = "/.Rprofile"
R_ENVIRON_USER   = "/.Renviron"
}
def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
timeline {
enabled = true
file    = "${params.tracedir}/execution_timeline_${trace_timestamp}.html"
}
report {
enabled = true
file    = "${params.tracedir}/execution_report_${trace_timestamp}.html"
}
trace {
enabled = true
file    = "${params.tracedir}/execution_trace_${trace_timestamp}.txt"
}
dag {
enabled = true
file    = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg"
}
manifest {
name            = 'nf-core/rnaseq'
author          = 'Phil Ewels, Rickard Hammarén'
homePage        = 'https://github.com/nf-core/rnaseq'
description     = 'Nextflow RNA-Seq analysis pipeline, part of the nf-core community.'
mainScript      = 'main.nf'
nextflowVersion = '!>=21.04.0'
version         = '3.3'
}
// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
if (type == 'memory') {
try {
if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
return params.max_memory as nextflow.util.MemoryUnit
else
return obj
} catch (all) {
println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'time') {
try {
if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
return params.max_time as nextflow.util.Duration
else
return obj
} catch (all) {
println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'cpus') {
try {
return Math.min( obj, params.max_cpus as int )
} catch (all) {
println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
return obj
}
}
}

=================================

我使用conda构建了这个环境,它看起来像这样:

我用conda env export得到这个

name: nf-core
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_gnu
- alsa-lib=1.2.3=h516909a_0
- appdirs=1.4.4=pyh9f0ad1d_0
- attrs=21.2.0=pyhd8ed1ab_0
- backports=1.0=py_2
- backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
- brotlipy=0.7.0=py37h5e8e339_1001
- bzip2=1.0.8=h7f98852_4
- c-ares=1.17.2=h7f98852_0
- ca-certificates=2021.5.30=ha878542_0
- cairo=1.16.0=h6cf1ce9_1008
- cattrs=1.8.0=pyhd8ed1ab_0
- certifi=2021.5.30=py37h89c1867_0
- cffi=1.14.6=py37hc58025e_0
- chardet=4.0.0=py37h89c1867_1
- charset-normalizer=2.0.0=pyhd8ed1ab_0
- click=8.0.1=py37h89c1867_0
- cni=0.8.0=hc0beb16_0
- cni-plugins=0.9.1=ha8f183a_0
- colorama=0.4.4=pyh9f0ad1d_0
- commonmark=0.9.1=py_0
- coreutils=8.25=1
- cryptography=3.4.7=py37h5d9358c_0
- curl=7.78.0=hea6ffbf_0
- expat=2.4.1=h9c3ff4c_0
- fontconfig=2.13.1=hba837de_1005
- freetype=2.10.4=h0708190_1
- future=0.18.2=py37h89c1867_3
- gettext=0.19.8.1=h0b5b191_1005
- giflib=5.2.1=h36c2ea0_2
- git=2.33.0=pl5321hc30692c_0
- gitdb=4.0.7=pyhd8ed1ab_0
- gitpython=3.1.18=pyhd8ed1ab_0
- graphite2=1.3.13=h58526e2_1001
- harfbuzz=2.9.1=h83ec7ef_0
- icu=68.1=h58526e2_0
- idna=3.1=pyhd3deb0d_0
- importlib-metadata=4.8.1=py37h89c1867_0
- importlib_metadata=4.8.1=hd8ed1ab_0
- itsdangerous=2.0.1=pyhd8ed1ab_0
- jbig=2.1=h7f98852_2003
- jinja2=3.0.1=pyhd8ed1ab_0
- jpeg=9d=h36c2ea0_0
- jq=1.6=h36c2ea0_1000
- jsonschema=3.2.0=py37hc8dfbb8_1
- krb5=1.19.2=hcc1bbae_0
- lcms2=2.12=hddcbb42_0
- ld_impl_linux-64=2.36.1=hea4e1c9_2
- lerc=2.2.1=h9c3ff4c_0
- libarchive=3.5.2=hccf745f_0
- libcurl=7.78.0=h2574ce0_0
- libdeflate=1.7=h7f98852_5
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libffi=3.3=h58526e2_2
- libgcc=7.2.0=h69d50b8_2
- libgcc-ng=11.1.0=hc902ee8_8
- libglib=2.68.4=h3e27bee_0
- libgomp=11.1.0=hc902ee8_8
- libiconv=1.16=h516909a_0
- libnghttp2=1.43.0=h812cca2_0
- libpng=1.6.37=h21135ba_2
- libseccomp=2.4.4=h36c2ea0_0
- libssh2=1.10.0=ha56f1ee_0
- libstdcxx-ng=11.1.0=h56837e0_8
- libtiff=4.3.0=hf544144_1
- libuuid=2.32.1=h7f98852_1000
- libwebp-base=1.2.1=h7f98852_0
- libxcb=1.13=h7f98852_1003
- libxml2=2.9.12=h72842e0_0
- lz4-c=1.9.3=h9c3ff4c_1
- lzo=2.10=h516909a_1000
- markupsafe=2.0.1=py37h5e8e339_0
- ncurses=6.2=h58526e2_4
- nextflow=21.04.0=h4a94de4_0
- nf-core=2.1=pyh5e36f6f_0
- oniguruma=6.9.7.1=h7f98852_0
- openjdk=11.0.9.1=h5cc2fde_1
- openssl=1.1.1l=h7f98852_0
- packaging=21.0=pyhd8ed1ab_0
- pcre=8.45=h9c3ff4c_0
- pcre2=10.37=h032f7d1_0
- perl=5.32.1=0_h7f98852_perl5
- pip=21.2.4=pyhd8ed1ab_0
- pixman=0.40.0=h36c2ea0_0
- prompt-toolkit=3.0.20=pyha770c72_0
- prompt_toolkit=3.0.20=hd8ed1ab_0
- pthread-stubs=0.4=h36c2ea0_1001
- pycparser=2.20=pyh9f0ad1d_2
- pygments=2.10.0=pyhd8ed1ab_0
- pyopenssl=20.0.1=pyhd8ed1ab_0
- pyparsing=2.4.7=pyh9f0ad1d_0
- pyrsistent=0.17.3=py37h5e8e339_2
- pysocks=1.7.1=py37h89c1867_3
- python=3.7.10=hffdb5ce_100_cpython
- python_abi=3.7=2_cp37m
- pyyaml=5.4.1=py37h5e8e339_1
- questionary=1.10.0=pyhd8ed1ab_0
- readline=8.1=h46c0cb4_0
- requests=2.26.0=pyhd8ed1ab_0
- requests-cache=0.8.0=pyhd8ed1ab_0
- rich=10.9.0=py37h89c1867_0
- setuptools=58.0.4=py37h89c1867_0
- singularity=3.7.1=hca90b9e_0
- six=1.16.0=pyh6c4a22f_0
- smmap=3.0.5=pyh44b312d_0
- sqlite=3.36.0=h9cd32fc_1
- squashfs-tools=4.4=h6b73730_2
- tabulate=0.8.9=pyhd8ed1ab_0
- tk=8.6.11=h27826a3_1
- typing_extensions=3.10.0.0=pyha770c72_0
- url-normalize=1.4.3=pyhd8ed1ab_0
- urllib3=1.26.6=pyhd8ed1ab_0
- wcwidth=0.2.5=pyh9f0ad1d_2
- wheel=0.37.0=pyhd8ed1ab_1
- xorg-fixesproto=5.0=h7f98852_1002
- xorg-inputproto=2.3.2=h7f98852_1002
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.0.10=h7f98852_0
- xorg-libsm=1.2.3=hd9c2040_1000
- xorg-libx11=1.7.2=h7f98852_0
- xorg-libxau=1.0.9=h7f98852_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xorg-libxext=1.3.4=h7f98852_1
- xorg-libxfixes=5.0.3=h7f98852_1004
- xorg-libxi=1.7.10=h7f98852_0
- xorg-libxrender=0.9.10=h7f98852_1003
- xorg-libxtst=1.2.3=h7f98852_1002
- xorg-recordproto=1.14.2=h7f98852_1002
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h7f98852_1002
- xorg-xproto=7.0.31=h7f98852_1007
- xz=5.2.5=h516909a_1
- yaml=0.2.5=h516909a_0
- zipp=3.5.0=pyhd8ed1ab_0
- zlib=1.2.11=h516909a_1010
- zstd=1.5.0=ha95c52a_0

有时作业由于各种原因而失败,Nextflow管道可以以不同的方式处理这些错误,无论是好是坏。nf-core/rnaseq管道(3.3版)使用以下errorStrategy:

errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries    = 1
maxErrors     = '-1'

https://github.com/nf-core/rnaseq/blob/3.3/conf/base.config L17-L19

请注意,maxRetries的值仅在使用'retry'错误策略时应用。


在您的'Case 1'中得到'No such file or directory',因为在它试图运行脚本命令之前,输入文件没有被暂存。重新运行.command.run脚本(正如您所做的那样)将首先尝试暂存输入文件,然后再运行.command.sh中的脚本命令。您应该能够只-resume工作流,而不必手动干预,失败的作业将自动重试。

"案例2"中的两个失败在拉两个(不同的)奇点图像时看起来像网络错误。这可能是网络连接薄弱的结果。

我不会太担心这样的错误。这些并不罕见。也就是说,我认为第一个可以处理得更好,只需要在你的下一个流中设置errorStrategy = 'retry'。配置以覆盖默认行为。实际上,我发现带后退的动态重试(如下所示)也非常有效。如果你打算一遍又一遍运行管道,那么为Singularity设置一个cacheDir以避免重复拉取可能是值得的。

process {
errorStrategy = {
sleep( Math.pow( 2, task.attempt ) * 150 as long )
return 'retry'
}
maxRetries = 3
}
singularity {
cacheDir = '/path/to/containers'
}

我忘了说我是在一台配置了WSL2的Windows 10 PC上,我遇到了一个奇怪的问题,VMMEM抓住了我所有的内存,不让它走。在搞乱了Nextflow并通过论坛寻找我的问题和错误的原因后,我意识到我是一个巨大的新手,并设置了我的.wslconfig文件来限制我的子系统只有2GB的内存,但默认的nf-core/rnaseq管道要求6GB。

这个命令修复了我所有的问题:

nextflow run nf-core/rnaseq -profile test,singularity --aligner hisat2 --max_memory 1.5GB

我希望我正确地识别了潜在的问题,但nf-core/rnaseq现在为我工作了:)

编辑:他们甚至提到默认资源在这里可能不合适:https://nf-co.re/rnaseq/usage#resource-requests

最新更新