所以我有这样的东西:
data.frame(content = c("a","a","b","b","c","c"),
eje = c("politics","sports","education","sports","health","politics"),
value = c(3,2,1,2,1,1))
我想按content
分组,并保留eje
中value
值最高的值,并在两者相等时保留这两个值。
在这个示例中,我将保留:
data.frame(content = c("a","b","c","c"),
eje = c("politics","sports","health","politics"),
value = c(3,2,1,1))`
在SQL上,我会做一些像RANK OVER PARTITION BY(内容,DESC值),然后过滤值为"1">
d = data.frame(content = c("a","a","b","b","c","c"),
eje = c("politics","sports","education","sports","health","politics"),
value = c(3,2,1,2,1,1))
library(dplyr)
d %>%
group_by(content) %>%
slice_max(value)
# # A tibble: 4 × 3
# # Groups: content [3]
# content eje value
# <chr> <chr> <dbl>
# 1 a politics 3
# 2 b sports 2
# 3 c health 1
# 4 c politics 1
data.table
option:
library(data.table)
dt <- data.table(df)
dt[dt[, .I[value == max(value)], by=content]$V1]
输出:
content eje value
1: a politics 3
2: b sports 2
3: c health 1
4: c politics 1