我正试图通过新主题重置Consecutive_days列。我相信这很简单。我是data.table的新手。这在data.table中可能吗?还是我需要转换回data.frame?
旧数据表:
DT = data.table(
Subject = rep(c("A", "B"), 4:3),
Date = as.Date(
sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
'%m-%d-%y'
)
)
DT[]
# Subject Date
# 1: A 2001-10-22
# 2: A 2001-10-23
# 3: A 2001-10-24
# 4: A 2001-10-25
# 5: B 2002-10-25
# 6: B 2002-10-26
# 7: B 2002-10-27
我尝试过的:
DT[, Consecutive_days := c(0,diff(Date)), by =.(Subject)]
发生了什么:
# Subject Date Consecutive_days
# 1: A 2001-10-22 0
# 2: A 2001-10-23 1
# 3: A 2001-10-24 1
# 4: A 2001-10-25 1
# 5: B 2002-10-25 0
# 6: B 2002-10-26 1
# 7: B 2002-10-27 1
我正在努力做到这一点;每次Subject
更改时重置
# Subject Date Consecutive_days
# 1: A 2001-10-22 0
# 2: A 2001-10-23 1
# 3: A 2001-10-24 2
# 4: A 2001-10-25 3
# 5: B 2002-10-25 0
# 6: B 2002-10-26 1
# 7: B 2002-10-27 2
为什么不简单地添加一个分组语句by = Subject
来计算它group_wise,而只添加cumsum
来计算您已经计算出的差异字段呢。
library(data.table)
DT = data.table(
Subject = rep(c("A", "B"), 4:3),
Date = as.Date(
sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
'%m-%d-%y'
)
)
DT[, cons_days := cumsum(c(0, diff(Date))), by = Subject]
DT
#> Subject Date cons_days
#> 1: A 2001-10-22 0
#> 2: A 2001-10-23 1
#> 3: A 2001-10-24 2
#> 4: A 2001-10-25 3
#> 5: B 2002-10-25 0
#> 6: B 2002-10-26 1
#> 7: B 2002-10-27 2
创建于2021-05-18由reprex包(v2.0.0(
这是一个tidyverse
解决方案,尽管您正在寻找data.table
解决方案,以防您也感兴趣:
library(dplyr)
DT %>%
group_by(Subject) %>%
mutate(Consecutive_days = cumsum(as.numeric(difftime(Date, lag(Date, default = first(Date)), units = "days"))))
# A tibble: 7 x 3
# Groups: Subject [2]
Subject Date Consecutive_days
<chr> <date> <dbl>
1 A 2001-10-22 0
2 A 2001-10-23 1
3 A 2001-10-24 2
4 A 2001-10-25 3
5 B 2002-10-25 0
6 B 2002-10-26 1
7 B 2002-10-27 2
DT[, group := cumsum(c(2, diff(Date)) != 1), by = Subject]
DT[, Consecutive_days := seq_len(.N) - 1, by = .(Subject, group)]
DT[, group := NULL]
Subject Date Consecutive_days
1: A 2001-10-22 0
2: A 2001-10-23 1
3: A 2001-10-24 2
4: A 2001-10-25 3
5: B 2002-10-25 0
6: B 2002-10-26 1
7: B 2002-10-27 2
如果你想在同一组中有相同的日期,那么换成cumsum(c(2, diff(Date)) > 1)