如何根据映射到染色体的读数将bam文件拆分为单独的bam



nextflow;

输入是一个bam文件;对于[1..22,X,Y]中的chr,执行samtools$bam$chr;输出为24 bams

我已经创建了一个名为splitbam的进程;它以id、bam文件和chrid作为输入,执行samtools并输出${id}${chr}.bam

请注意,从文件中读取可以是I/O绑定操作。如果多个进程试图同时读取同一BAM文件,它们最终可能会争夺相同的资源。通常,我们会尽量避免这种情况。更好的方法是在所有区域中传递,并由一个工作进程拆分BAM。有很多方法可以做到这一点,最好的解决方案取决于你到底想要什么。您甚至可以使用另一个参数来选择要从每个BAM中分割的染色体。假设您的BAM文件是坐标排序和索引的,您可以尝试:

params.indexed_bams = './path/to/bams/*.bam{,.bai}'
params.chroms = 'chr{{1..22},X,Y}'
process splitBAM {
tag { sample }
input:
tuple val(sample), path(indexed_bam)
output:
tuple val(sample), path("*.bam")
script:
def bam = indexed_bam.first()
"""
for chrom in ${params.chroms}
do
samtools view \
-o "${bam.baseName}.${chrom}.bam" \
"${bam}" \
"${chrom}"
done
"""
}
workflow {
indexed_bams = Channel.fromFilePairs( params.indexed_bams )

splitBAM( indexed_bams )
splitBAM.out.view()
}

结果:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.0
Launching `main.nf` [stupefied_marconi] DSL2 - revision: 1383b12928
[91/a7c661] Submitted process > splitBAM (C)
[ce/f579b6] Submitted process > splitBAM (B)
[68/2e62aa] Submitted process > splitBAM (A)
[C, [/path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr1.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr10.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr11.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr12.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr13.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr14.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr15.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr16.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr17.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr18.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr19.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr2.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr20.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr21.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr22.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr3.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr4.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr5.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr6.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr7.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr8.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chr9.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chrX.bam, /path/to/work/91/a7c66188f92fd66331fe235bd923e7/C.chrY.bam]]
[A, [/path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr1.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr10.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr11.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr12.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr13.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr14.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr15.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr16.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr17.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr18.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr19.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr2.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr20.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr21.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr22.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr3.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr4.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr5.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr6.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr7.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr8.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chr9.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chrX.bam, /path/to/work/68/2e62aaee65cfb4791fe3d3b530fc15/A.chrY.bam]]
[B, [/path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr1.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr10.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr11.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr12.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr13.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr14.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr15.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr16.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr17.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr18.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr19.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr2.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr20.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr21.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr22.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr3.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr4.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr5.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr6.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr7.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr8.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chr9.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chrX.bam, /path/to/work/ce/f579b6299acafeaf53a386bd5abd25/B.chrY.bam]]


您通常不想这样做,但为了完整性,这里有一种方法可以使用文件中的区域或染色体名称列表(每行一个)来拆分BAM文件。它只使用each输入限定符为区域集合中的每个区域重复执行:

params.indexed_bams = './path/to/bams/*.bam{,.bai}'
params.regions = './regions.list'
process splitBAM {
tag { "${sample}, ${region}" }
input:
tuple val(sample), path(indexed_bam)
each region
output:
tuple val(sample), path("${bam.baseName}.${suffix}.bam")
script:
bam = indexed_bam.first()
suffix = region.replaceAll(':', '-')
"""
samtools view \
-o "${bam.baseName}.${suffix}.bam" \
"${bam}" \
"${region}"
"""
}
workflow {
Channel
.fromFilePairs( params.indexed_bams )
.set { indexed_bams }
Channel
.fromPath( params.regions )
.splitText() { it.trim() }
.collect()
.set { regions }
splitBAM( indexed_bams, regions )
splitBAM.out.view()
}

结果:

$ nextflow run main.nf 
N E X T F L O W  ~  version 23.04.0
Launching `main.nf` [grave_mcnulty] DSL2 - revision: 33a7a22c3b
executor >  local (72)
[a1/2ee71d] process > splitBAM (C, chr12) [100%] 72 of 72 ✔
[A, /path/to/work/60/3bc1fdbe20235076c5af4493a62977/A.chr6.bam]
[A, /path/to/work/a0/4c16d0871aea79e57bf674a1f32949/A.chr3.bam]
[A, /path/to/work/8c/e509e5b9aeea580fbf5cdb46c0a437/A.chr8.bam]
[A, /path/to/work/db/f96b1dd7bf07b26107afa814d82734/A.chr5.bam]
[A, /path/to/work/c8/49d20ee82c6c23b1e33b6974ba4551/A.chr7.bam]
[A, /path/to/work/43/a604a70bbac2f901b5ec18ede98f91/A.chr4.bam]
[A, /path/to/work/70/5ec915a5ae34cc16438fc3406bd921/A.chr1.bam]
[A, /path/to/work/f5/68049cedf63567004985524086051d/A.chr2.bam]
[A, /path/to/work/4d/4dd4b99aae0b8b2babef02a43e72f4/A.chr13.bam]
[A, /path/to/work/ff/2d2135e427e1c6c8d512065ca55055/A.chr10.bam]
[A, /path/to/work/65/024bdd698ca9da7c4ff7a98901f294/A.chr9.bam]
[A, /path/to/work/a5/b61625c6286d5a8719b7071ccea024/A.chr11.bam]
[A, /path/to/work/81/312802de9d1e6fda74501ebbc0aaaf/A.chr14.bam]
[A, /path/to/work/96/1a3956318fa953c2f10b12faecab22/A.chr18.bam]
[A, /path/to/work/20/276ac5a68a6edea88da145ff18c616/A.chr12.bam]
[A, /path/to/work/de/6c4eb41cb84b2e41bae058dde608f5/A.chr20.bam]
[A, /path/to/work/f4/4cba375ea0ab6bea2193475e56c20b/A.chr21.bam]
[A, /path/to/work/7d/a8ccf0ad38c5bd4f32979997998f39/A.chr19.bam]
[A, /path/to/work/17/e8a8c3123d8ca6440b1255e3fab001/A.chr15.bam]
[A, /path/to/work/ab/ee4398ad8bfb947322b19f703fd036/A.chr22.bam]
[A, /path/to/work/54/66707f7e243d4c765acfb057f686fa/A.chrX.bam]
[A, /path/to/work/bf/7cf4b2b22f733b95004e09bcb17a24/A.chrY.bam]
[A, /path/to/work/3f/cd987b9656f6affbc25e1fd73383ba/A.chr16.bam]
[A, /path/to/work/6f/1b076c7fe87a2f1fd366067b2a71f6/A.chr17.bam]
[B, /path/to/work/a2/d821e1520c628ff6e7bf229986bc40/B.chr4.bam]
[B, /path/to/work/ef/a4522fba782d6ab785be1492d426c1/B.chr13.bam]
[B, /path/to/work/00/ed80767027319a5fa63c6c9954a156/B.chr6.bam]
[B, /path/to/work/5e/efc8b4a52067a93a3020a30719c164/B.chr1.bam]
[B, /path/to/work/00/3336b1853123aa3dcede250e5f4ab9/B.chr2.bam]
[B, /path/to/work/65/e19c9845313f23134caf7a96c6fd6d/B.chr3.bam]
[B, /path/to/work/33/63857490a462c3573a84d7f184ad0e/B.chr5.bam]
[B, /path/to/work/c5/7cd52e2378dc077849fb95b2d4c6a3/B.chr11.bam]
[B, /path/to/work/66/8f0b18f23e9a7b2fcf966b721eff81/B.chr14.bam]
[B, /path/to/work/6c/6425122de40c637907d3fc403d6bfb/B.chr7.bam]
[B, /path/to/work/83/96973f5b21fdba623415c2f28d6587/B.chr15.bam]
[B, /path/to/work/be/26fd9f8b591caee13687c6908c6484/B.chr18.bam]
[B, /path/to/work/8a/9d080d0565c2b2b0e8984bac3af281/B.chr10.bam]
[B, /path/to/work/6f/bef89622a976e6219218d548484a9e/B.chr9.bam]
[B, /path/to/work/ea/2e42a4188eb1c46a3c192e4e53e75b/B.chr8.bam]
[B, /path/to/work/0a/dfa6ab659a21c3fc85c86cd99a8535/B.chr12.bam]
[B, /path/to/work/79/9322435a6b125adc9edf2616d51334/B.chrY.bam]
[B, /path/to/work/bb/71d75443d17680959aebfb410aafcd/B.chr20.bam]
[B, /path/to/work/99/20374da262707f5ac49969d6a5b307/B.chr16.bam]
[B, /path/to/work/80/4aff0eba480ef85ed5e1ab0bf4a11b/B.chr19.bam]
[B, /path/to/work/a9/abdd5b73f237185c57c1a816a94d5b/B.chrX.bam]
[B, /path/to/work/38/7861f2efc4754da157336d3d967d95/B.chr22.bam]
[B, /path/to/work/7b/5d328513b7bf6d03a2dc558898ff12/B.chr17.bam]
[C, /path/to/work/b3/81ac227bafa5eb95093d10cfd6b550/C.chr1.bam]
[C, /path/to/work/52/1220d524cdf2689052cd6cd0fc85de/C.chr3.bam]
[B, /path/to/work/1d/ba60a1afde1ab4dd2c469c6ff12bf8/B.chr21.bam]
[C, /path/to/work/90/1d146f3fa4e62bafbf96000ff2e4d5/C.chr4.bam]
[C, /path/to/work/50/fe8e2b2bf7cfcc522b4b465a1e6103/C.chr5.bam]
[C, /path/to/work/03/546d76f47d359c7561a8deba0c8389/C.chr8.bam]
[C, /path/to/work/e6/19629fd4b206dfb9e5f9a6ef6da78d/C.chr6.bam]
[C, /path/to/work/b6/a1993d8ee8837ba70ed12daf21a418/C.chr2.bam]
[C, /path/to/work/d8/d1bfc3254ea2205ac2a3cc210674db/C.chr11.bam]
[C, /path/to/work/4b/ef5c546102d29442f13217392e9ef2/C.chr13.bam]
[C, /path/to/work/72/848ab4f83561983e2fa6d74969a78d/C.chr15.bam]
[C, /path/to/work/df/afc23ff7ec596c5d51691fa4b67c43/C.chr7.bam]
[C, /path/to/work/52/0f4bab67efa1b013b69a42f2c211f7/C.chr9.bam]
[C, /path/to/work/a4/ceb86ff5c6b091af4c2c963ce4fda9/C.chr19.bam]
[C, /path/to/work/e7/a8928450a870464a08c372a0a96fc3/C.chr10.bam]
[C, /path/to/work/98/f1b2ce8fd5d498e77209cbc13e2898/C.chr14.bam]
[C, /path/to/work/73/d0edcdd7d922fc5a80842779421f48/C.chr21.bam]
[C, /path/to/work/4e/27d1000ef50e2672638b43e5bf24e0/C.chr17.bam]
[C, /path/to/work/b0/183bfdb6bb2f3702d9e7208aaf268d/C.chr18.bam]
[C, /path/to/work/c3/658b0323b33e7e0d0ddd1e14277edb/C.chrY.bam]
[C, /path/to/work/d4/06cef9ebb33f887eaf487f60cb5431/C.chrX.bam]
[C, /path/to/work/67/634d6a2ad97626c2afd4bc0dfe8ee7/C.chr16.bam]
[C, /path/to/work/35/cb224aa30c1f7fa89cb0e9f725361c/C.chr22.bam]
[C, /path/to/work/93/ed9f77df92b1c697764c1eff796dba/C.chr20.bam]
[C, /path/to/work/a1/2ee71d49591ebb8e827dde9adc602c/C.chr12.bam]

相关内容

  • 没有找到相关文章

最新更新