r-如何在data.table中组合聚合、筛选和by



我需要聚合所有列,但前提是它们的值为1,并按另一列分组。

以下是我如何一步一步地执行此操作,即逐列执行,从fooout,即预期输出。

set.seed(123)
foo <- data.table(id = rep(LETTERS[1:10], each = 10),
code = sample(c(111, 222, 333, 444), 100, replace = TRUE),
x1 = sample(0:1, 100, replace = TRUE),
x2 = sample(0:1, 100, replace = TRUE),
x3 = sample(0:1, 100, replace = TRUE),
x4 = sample(0:1, 100, replace = TRUE),
x5 = sample(0:1, 100, replace = TRUE),
x6 = sample(0:1, 100, replace = TRUE))

f1 <- foo[x1 == 1, .(x1 = uniqueN(code)), by = id]
f2 <- foo[x2 == 1, .(x2 = uniqueN(code)), by = id]
f3 <- foo[x3 == 1, .(x3 = uniqueN(code)), by = id]
setkey(f1, id)
setkey(f2, id)
setkey(f3, id)
out <- f1[f2,][f3,]

我相信在data.table中有一种优雅而快速的方法可以做到这一点,也许可以使用函数和.SDcols参数,对吧?

但我找不到…

非常感谢!!

.SD.SDcols:

foo[,lapply(.SD, function(x) uniqueN(code[x==1])), .SDcols = x1:x3, by = id]
id x1 x2 x3
1:  A  3  3  2
2:  B  4  4  4
3:  C  2  2  3
4:  D  4  3  3
5:  E  4  4  3
6:  F  1  3  3
7:  G  3  2  2
8:  H  4  2  3
9:  I  3  3  3
10:  J  3  2  2

使用tidyverse

library(dplyr)
foo %>%
group_by(id) %>%
summarise(across(x1:x3, ~ n_distinct(code[.==1])))

-输出

# A tibble: 10 x 4
#   id       x1    x2    x3
# * <chr> <int> <int> <int>
# 1 A         3     3     2
# 2 B         4     4     4
# 3 C         2     2     3
# 4 D         4     3     3
# 5 E         4     4     3
# 6 F         1     3     3
# 7 G         3     2     2
# 8 H         4     2     3
# 9 I         3     3     3
#10 J         3     2     2

或者可以使用collapse

library(collapse)
collap(slt(foo, id, x1:x3), ~ id, fNdistinct)

使用duplicated的另一个data.table选项(但我认为@Waldi的uniqueN更简单(

> foo[, lapply(.SD, function(x) sum(!duplicated(code[x == 1]))), id, .SDcols = x1:x3]
id x1 x2 x3
1:  A  3  3  2
2:  B  4  4  4
3:  C  2  2  3
4:  D  4  3  3
5:  E  4  4  3
6:  F  1  3  3
7:  G  3  2  2
8:  H  4  2  3
9:  I  3  3  3
10:  J  3  2  2

最新更新