r语言 - 如何使用dplyr选择top N值



我有以下数据框架:

library(tidyverse)
df <- structure(list(gene_ratio = c(0.636363636363636, 0.571428571428571, 
0.5, 0.5, 0.5, 0.454545454545455, 0.454545454545455, 0.444444444444444, 
0.428571428571429, 0.357142857142857)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

它看起来像这样:

# A tibble: 10 × 1
gene_ratio
<dbl>
1      0.636
2      0.571
3      0.5  
4      0.5  
5      0.5  
6      0.455
7      0.455
8      0.444
9      0.429
10      0.357

我想要做的是选择5个最重要的值,这些值产生:

gene_ratio
0.636
0.571
0.5  
0.5  
0.5  
0.455
0.455
0.444

我怎么才能做到呢?我试着:

df %>%
dplyr::top_n(n = 5, wt = gene_ratio)

但失败了。

df %>% filter(dense_rank(-gene_ratio) %in% 1:5)

df %>% filter(as.integer(ordered(-gene_ratio)) %in% 1:5)

df %>% filter(data.table::frank(-gene_ratio, ties.method = "dense") %in% 1:5)
n <- 5

df %>%
mutate(Snum = cumsum(!duplicated(gene_ratio))) %>%
filter(Snum <= n) %>%
select(gene_ratio)

输出
gene_ratio
<dbl>
1      0.636
2      0.571
3      0.5  
4      0.5  
5      0.5  
6      0.455
7      0.455
8      0.444

arrange的选项

library(dplyr)
df %>% 
arrange(desc(gene_ratio)) %>% 
filter(gene_ratio %in% head(unique(gene_ratio), 5))

与产出

# A tibble: 8 x 1
gene_ratio
<dbl>
1      0.636
2      0.571
3      0.5  
4      0.5  
5      0.5  
6      0.455
7      0.455
8      0.444

top_nslice函数所取代。slice_max的一个选项可以是

df %>% 
slice_max(order_by = gene_ratio, 
n = sum(tail(sort(table(df$gene_ratio)), 5)))
# A tibble: 8 x 1
gene_ratio
<dbl>
1      0.636
2      0.571
3      0.5  
4      0.5  
5      0.5  
6      0.455
7      0.455
8      0.444

最新更新