我有以下数据框架:
library(tidyverse)
df <- structure(list(gene_ratio = c(0.636363636363636, 0.571428571428571,
0.5, 0.5, 0.5, 0.454545454545455, 0.454545454545455, 0.444444444444444,
0.428571428571429, 0.357142857142857)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
它看起来像这样:
# A tibble: 10 × 1
gene_ratio
<dbl>
1 0.636
2 0.571
3 0.5
4 0.5
5 0.5
6 0.455
7 0.455
8 0.444
9 0.429
10 0.357
我想要做的是选择5个最重要的值,这些值产生:
gene_ratio
0.636
0.571
0.5
0.5
0.5
0.455
0.455
0.444
我怎么才能做到呢?我试着:
df %>%
dplyr::top_n(n = 5, wt = gene_ratio)
但失败了。
df %>% filter(dense_rank(-gene_ratio) %in% 1:5)
或
df %>% filter(as.integer(ordered(-gene_ratio)) %in% 1:5)
或
df %>% filter(data.table::frank(-gene_ratio, ties.method = "dense") %in% 1:5)
n <- 5
df %>%
mutate(Snum = cumsum(!duplicated(gene_ratio))) %>%
filter(Snum <= n) %>%
select(gene_ratio)
输出gene_ratio
<dbl>
1 0.636
2 0.571
3 0.5
4 0.5
5 0.5
6 0.455
7 0.455
8 0.444
与arrange
的选项
library(dplyr)
df %>%
arrange(desc(gene_ratio)) %>%
filter(gene_ratio %in% head(unique(gene_ratio), 5))
与产出
# A tibble: 8 x 1
gene_ratio
<dbl>
1 0.636
2 0.571
3 0.5
4 0.5
5 0.5
6 0.455
7 0.455
8 0.444
top_n
被slice
函数所取代。slice_max
的一个选项可以是
df %>%
slice_max(order_by = gene_ratio,
n = sum(tail(sort(table(df$gene_ratio)), 5)))
# A tibble: 8 x 1
gene_ratio
<dbl>
1 0.636
2 0.571
3 0.5
4 0.5
5 0.5
6 0.455
7 0.455
8 0.444