我需要聚合所有列,但前提是它们的值为1,并按另一列分组。
以下是我如何一步一步地执行此操作,即逐列执行,从foo到out,即预期输出。
set.seed(123)
foo <- data.table(id = rep(LETTERS[1:10], each = 10),
code = sample(c(111, 222, 333, 444), 100, replace = TRUE),
x1 = sample(0:1, 100, replace = TRUE),
x2 = sample(0:1, 100, replace = TRUE),
x3 = sample(0:1, 100, replace = TRUE),
x4 = sample(0:1, 100, replace = TRUE),
x5 = sample(0:1, 100, replace = TRUE),
x6 = sample(0:1, 100, replace = TRUE))
f1 <- foo[x1 == 1, .(x1 = uniqueN(code)), by = id]
f2 <- foo[x2 == 1, .(x2 = uniqueN(code)), by = id]
f3 <- foo[x3 == 1, .(x3 = uniqueN(code)), by = id]
setkey(f1, id)
setkey(f2, id)
setkey(f3, id)
out <- f1[f2,][f3,]
我相信在data.table
中有一种优雅而快速的方法可以做到这一点,也许可以使用函数和.SDcols
参数,对吧?
但我找不到…
非常感谢!!
带.SD
和.SDcols
:
foo[,lapply(.SD, function(x) uniqueN(code[x==1])), .SDcols = x1:x3, by = id]
id x1 x2 x3
1: A 3 3 2
2: B 4 4 4
3: C 2 2 3
4: D 4 3 3
5: E 4 4 3
6: F 1 3 3
7: G 3 2 2
8: H 4 2 3
9: I 3 3 3
10: J 3 2 2
使用tidyverse
library(dplyr)
foo %>%
group_by(id) %>%
summarise(across(x1:x3, ~ n_distinct(code[.==1])))
-输出
# A tibble: 10 x 4
# id x1 x2 x3
# * <chr> <int> <int> <int>
# 1 A 3 3 2
# 2 B 4 4 4
# 3 C 2 2 3
# 4 D 4 3 3
# 5 E 4 4 3
# 6 F 1 3 3
# 7 G 3 2 2
# 8 H 4 2 3
# 9 I 3 3 3
#10 J 3 2 2
或者可以使用collapse
library(collapse)
collap(slt(foo, id, x1:x3), ~ id, fNdistinct)
使用duplicated
的另一个data.table
选项(但我认为@Waldi的uniqueN
更简单(
> foo[, lapply(.SD, function(x) sum(!duplicated(code[x == 1]))), id, .SDcols = x1:x3]
id x1 x2 x3
1: A 3 3 2
2: B 4 4 4
3: C 2 2 3
4: D 4 3 3
5: E 4 4 3
6: F 1 3 3
7: G 3 2 2
8: H 4 2 3
9: I 3 3 3
10: J 3 2 2