r-获取每个序列名称的最长范围



我有一个GRanges对象,每个seqnames具有不同的基因组范围(例如染色体(
如何获得仅包含每个序列名/染色体的最长范围的GRanges

例如,如果grGRanges:

library(GenomicRanges)
# Make a GRanges object
set.seed(123)
gr <- GRanges(seqnames = rep(c("chr1", "chr2", "chr3"), times=2:4),
ranges = IRanges(start=sample.int(10000, 9), 
width = c(3,5,50,20,10,500,100,500,200)))
# Add a column with the width for clarity:
mcols(gr)$width <- width(gr)
gr
#GRanges object with 9 ranges and 1 metadata column:
#      seqnames    ranges strand |     width
#         <Rle> <IRanges>  <Rle> | <integer>
#  [1]     chr1 2463-2465      * |         3
#  [2]     chr1 2511-2515      * |         5
#  [3]     chr2 8718-8767      * |        50
#  [4]     chr2 2986-3005      * |        20
#  [5]     chr2 1842-1851      * |        10
#  [6]     chr3 9334-9833      * |       500
#  [7]     chr3 3371-3470      * |       100
#  [8]     chr3 4761-5260      * |       500
#  [9]     chr3 6746-6945      * |       200
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

然后我想获得以下GRanges:

#GRanges object with 3 ranges and 1 metadata column:
#    seqnames      ranges strand |     width
#       <Rle>   <IRanges>  <Rle> | <integer>
#  [1]     chr1 2511-2515      * |         5
#  [2]     chr2 8718-8767      * |        50
#  [3]     chr3 9334-9833      * |       500

对于我的应用程序,我可以只获得chr3的第一个最长范围,但我希望有一个也可以选择所有领带(如果有的话(的解决方案。

我找到了这个解决方案:

library(GenomicRanges)
# Make the GRanges object
set.seed(123)
gr <- GRanges(seqnames = rep(c("chr1", "chr2", "chr3"), times=2:4),
ranges = IRanges(start=sample.int(10000, 9), 
width = c(3,5,50,20,10,500,100,500,200)))
# Add a column with the width for clarity:
mcols(gr)$width <- width(gr)
# split gr by seqnames
grl <- split(gr, seqnames(gr))
# Get the width of each range organized as an IntegerList
wgr <- width(grl)
# if gr is ordered by seqnames this is equivalent to:
wgr <- splitAsList(width(gr), seqnames(gr))
wgr
#IntegerList of length 3
#[["chr1"]] 3 5
#[["chr2"]] 50 20 10
#[["chr3"]] 500 100 500 200
# Get the ranges with the largest width
# See ?'which.max,NumericList-method' for the global argument
longestRanges <- gr[which.max(wgr, global = TRUE)]
longestRanges
#GRanges object with 3 ranges and 1 metadata column:
#      seqnames      ranges strand |     width
#         <Rle>   <IRanges>  <Rle> | <integer>
#  [1]     chr1 2511-2515      * |         5
#  [2]     chr2 8718-8767      * |        50
#  [3]     chr3 9334-9833      * |       500
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

请注意,此解决方案仅在gr中的范围最初由seqnames分组/排序时有效
如果不是,则需要先用gr <- sort(gr)gr进行排序

要获得所有的平局,可以使用:

longestRanges <- unlist(grl[which(wgr == max(wgr))])
longestRanges
#GRanges object with 4 ranges and 1 metadata column:
#       seqnames    ranges strand |     width
#          <Rle> <IRanges>  <Rle> | <integer>
#  chr1     chr1 2511-2515      * |         5
#  chr2     chr2 8718-8767      * |        50
#  chr3     chr3 9334-9833      * |       500
#  chr3     chr3 4761-5260      * |       500
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

相关内容

  • 没有找到相关文章

最新更新