r语言 - 根据多列排列长格式数据帧,并考虑更改排序方向



我想根据分数变量value和附加group变量重新排列数据。但是,根据group排序应该是降序或升序。这些组由考试成绩(越高越好(和处理时间(越低越好(组成。

df <- data.frame(id = rep(1:4, 4),
value = rnorm(16, 5), 
group = c(paste0("test", 1:3), "time0"))
df$value[seq(4,16, 4)] <- 1:4
> df %>% group_by(group) %>% arrange(group, desc(value))
# A tibble: 16 x 3
# Groups:   group [4]
id value group
<int> <dbl> <fct>
1     3  6.06 test1
2     4  4.69 test1
3     1  4.32 test1
4     2  3.56 test1
5     4  5.96 test2
6     1  5.96 test2
7     3  4.43 test2
8     2  3.86 test2
9     3  6.28 test3
10     4  5.55 test3
11     2  4.59 test3
12     1  3.53 test3
13     4  4    time0
14     3  3    time0
15     2  2    time0
16     1  1    time0

所需的输出如下所示:

id value group
<int> <dbl> <fct>
1     3  6.06 test1
2     4  4.69 test1
3     1  4.32 test1
4     2  3.56 test1
5     4  5.96 test2
6     1  5.96 test2
7     3  4.43 test2
8     2  3.86 test2
9     3  6.28 test3
10     4  5.55 test3
11     2  4.59 test3
12     1  3.53 test3
13     4  1    time0
14     3  2    time0
15     2  3    time0
16     1  4    time0

我尝试使用arrange_if但无法弄清楚。 任何帮助都非常感谢。

感谢您到目前为止的回答,它们同样有帮助!


编辑澄清:这与这个问题不同,因为排序不仅基于多列,而且还取决于列内特征。

这使得测试组中的行按降序排序,时间组中的行按升序排序。如果你想要相反,只需反转 -1 和 1。

df %>% 
arrange(group, value*ifelse(grepl('time', group), 1, -1))
#    id    value group
# 1   1 6.358680 test1
# 2   1 6.100025 test1
# 3   1 4.844204 test1
# 4   1 3.622940 test1
# 5   2 5.763176 test2
# 6   2 4.897212 test2
# 7   2 4.585005 test2
# 8   2 3.529248 test2
# 9   3 5.387672 test3
# 10  3 4.835476 test3
# 11  3 4.605710 test3
# 12  3 4.521850 test3
# 13  4 1.000000 time0
# 14  4 2.000000 time0
# 15  4 3.000000 time0
# 16  4 4.000000 time0

这是另一个选项,当value是字符时有效

df <- data.frame(id = rep(1:4, 4),
value = rnorm(16, 5), 
group = c(paste0("test", 1:3), "time0"))
set.seed(2019)
df$value <- sample(letters, nrow(df), T)
df %>% 
arrange(group, rank(value)*ifelse(grepl('time', group), 1, -1))
#    id value group
# 1   1     u test1
# 2   1     f test1
# 3   1     c test1
# 4   1     b test1
# 5   2     s test2
# 6   2     p test2
# 7   2     f test2
# 8   2     b test2
# 9   3     v test3
# 10  3     u test3
# 11  3     s test3
# 12  3     h test3
# 13  4     a time0
# 14  4     q time0
# 15  4     q time0
# 16  4     r time0

我们可以做一个filter来排除'time0'组,对数据集的其余部分进行arrange,并与另一组组bind_rows

library(dplyr)
df %>% 
filter(group != 'time0') %>%
arrange(group, desc(value)) %>%
bind_rows(., df %>% 
filter(group == 'time0') %>% 
arrange(value))
#   id value group
#1   3  6.06 test1
#2   4  4.69 test1
#3   1  4.32 test1
#4   2  3.56 test1
#5   4  5.96 test2
#6   1  5.96 test2
#7   3  4.43 test2
#8   2  3.86 test2
#9   3  6.28 test3
#10  4  5.55 test3
#11  2  4.59 test3
#12  1  3.53 test3
#13  1  1.00 time0
#14  2  2.00 time0
#15  3  3.00 time0
#16  4  4.00 time0

此外,如果"值"可以是非数字的'

df %>%
arrange(group, desc(as.numeric(value)), is.na(as.numeric(value)))

数据

df <- structure(list(id = c(3L, 4L, 1L, 2L, 4L, 1L, 3L, 2L, 3L, 4L, 
2L, 1L, 4L, 3L, 2L, 1L), value = c(6.06, 4.69, 4.32, 3.56, 5.96, 
5.96, 4.43, 3.86, 6.28, 5.55, 4.59, 3.53, 4, 3, 2, 1), group = c("test1", 
"test1", "test1", "test1", "test2", "test2", "test2", "test2", 
"test3", "test3", "test3", "test3", "time0", "time0", "time0", 
"time0")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16"))

最新更新