r语言 - 如何在data.table中链接group_by, filter, distinct, count ? - r - How to chain group_by, filter, distinct, count in data.table? 小贝子编程网

我对data.table有点陌生,试图复制我的dplyrdata.table但无法得到相同的结果。

填词

library(data.table)
library(lubridate)
library(tidyverse)

(在这个虚拟数据中没有使用任何NA，但需要过滤掉NA)

test_df <- data.frame(id = c(1234, 1234, 5678, 5678),
date = c("2021-10-10","2021-10-10", "2021-8-10", "2021-8-15")) %>% 

mutate(date = ymd(date))

dplyr代码:

查找具有多个不同日期的id。

test_df %>%
group_by(id) %>%
filter(!is.na(date)) %>% 
distinct(date) %>% 
count(id) %>% 
filter(n > 1)

id      n
5678    2

data.table尝试:

test_dt <- setDT(test_df)
test_dt[!is.na(date), by = id][
,keyby = .(date)][
,.N, by = id][
N > 1
]

dplyr中的distinct可以是data.table中的unique,by选项

unique(setDT(test_df)[!is.na(date)], by = c("id", "date"))[, .N, by = id][N > 1]
id N
1: 5678 2

步骤如下

r语言 - 如何在data.table中链接group_by, filter, distinct, count ?