r-数据表中连续几天按主题重置



我正试图通过新主题重置Consecutive_days列。我相信这很简单。我是data.table的新手。这在data.table中可能吗?还是我需要转换回data.frame?

旧数据表:

DT = data.table(
Subject = rep(c("A", "B"), 4:3),
Date = as.Date(
sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
'%m-%d-%y'
)
)
DT[]
#    Subject       Date
# 1:       A 2001-10-22
# 2:       A 2001-10-23
# 3:       A 2001-10-24
# 4:       A 2001-10-25
# 5:       B 2002-10-25
# 6:       B 2002-10-26
# 7:       B 2002-10-27

我尝试过的:

DT[, Consecutive_days := c(0,diff(Date)), by =.(Subject)]

发生了什么:

#    Subject       Date Consecutive_days
# 1:       A 2001-10-22                0
# 2:       A 2001-10-23                1
# 3:       A 2001-10-24                1
# 4:       A 2001-10-25                1
# 5:       B 2002-10-25                0
# 6:       B 2002-10-26                1
# 7:       B 2002-10-27                1

我正在努力做到这一点;每次Subject更改时重置

#    Subject       Date Consecutive_days
# 1:       A 2001-10-22                0
# 2:       A 2001-10-23                1
# 3:       A 2001-10-24                2
# 4:       A 2001-10-25                3
# 5:       B 2002-10-25                0
# 6:       B 2002-10-26                1
# 7:       B 2002-10-27                2

为什么不简单地添加一个分组语句by = Subject来计算它group_wise,而只添加cumsum来计算您已经计算出的差异字段呢。

library(data.table)
DT = data.table(
Subject = rep(c("A", "B"), 4:3),
Date = as.Date(
sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
'%m-%d-%y'
)
)
DT[, cons_days := cumsum(c(0, diff(Date))), by = Subject]
DT
#>    Subject       Date cons_days
#> 1:       A 2001-10-22         0
#> 2:       A 2001-10-23         1
#> 3:       A 2001-10-24         2
#> 4:       A 2001-10-25         3
#> 5:       B 2002-10-25         0
#> 6:       B 2002-10-26         1
#> 7:       B 2002-10-27         2

创建于2021-05-18由reprex包(v2.0.0(

这是一个tidyverse解决方案,尽管您正在寻找data.table解决方案,以防您也感兴趣:

library(dplyr)
DT %>%
group_by(Subject) %>% 
mutate(Consecutive_days = cumsum(as.numeric(difftime(Date, lag(Date, default = first(Date)), units = "days"))))

# A tibble: 7 x 3
# Groups:   Subject [2]
Subject Date       Consecutive_days
<chr>   <date>                <dbl>
1 A       2001-10-22                0
2 A       2001-10-23                1
3 A       2001-10-24                2
4 A       2001-10-25                3
5 B       2002-10-25                0
6 B       2002-10-26                1
7 B       2002-10-27                2
DT[, group := cumsum(c(2, diff(Date)) != 1), by = Subject]
DT[, Consecutive_days := seq_len(.N) - 1, by = .(Subject, group)]
DT[, group := NULL]
Subject       Date Consecutive_days
1:       A 2001-10-22                0
2:       A 2001-10-23                1
3:       A 2001-10-24                2
4:       A 2001-10-25                3
5:       B 2002-10-25                0
6:       B 2002-10-26                1
7:       B 2002-10-27                2

如果你想在同一组中有相同的日期,那么换成cumsum(c(2, diff(Date)) > 1)

最新更新