r-设置数据表每组的前三年



我有一个数据表dt,它跨越了多年。数据是分组的,因此每组的年数不同。我只想保留每个小组的前三年。如何使用data.table进行此操作?以下是一些需要测试的数据:

dates <- c(seq(as.Date('2010-01-03'),as.Date('2019-12-31'),by = 1),
seq(as.Date('2013-01-02'),as.Date('2018-12-31'),by = 1),
seq(as.Date('2015-01-02'),as.Date('2020-07-31'),by = 1))
set.seed(1995)
value <- rnorm(length(dates), mean = 100, sd = 50)
IDs <- c(rep(c("ACG"),length.out = length(seq(as.Date('2010-01-03'),as.Date('2019-12-31'),by = 1))),
rep(c("MKD"),length.out = length(seq(as.Date('2013-01-02'),as.Date('2018-12-31'),by = 1))),
rep(c("ZED"),length.out = length(seq(as.Date('2015-01-02'),as.Date('2020-07-31'),by = 1)))
)
dt <- data.table(Date = dates,
Value = value,
ID = IDs
)
dt
Date     Value  ID
1: 2010-01-03 153.03816 ACG
2: 2010-01-04  83.22491 ACG
3: 2010-01-05 107.26521 ACG
4: 2010-01-06 119.70395 ACG
5: 2010-01-07 183.24604 ACG
---                         
7874: 2020-07-27 184.45801 ZED
7875: 2020-07-28  91.53373 ZED
7876: 2020-07-29  67.42443 ZED
7877: 2020-07-30 125.62496 ZED
7878: 2020-07-31  89.02373 ZED

最后的数据表应该只有前三年的数据,如下所示:

finalDT <- dt[c(1:1094,3651:4744,5841:6935),]
finalDT
Date     Value  ID
1: 2010-01-03 153.03816 ACG
2: 2010-01-04  83.22491 ACG
3: 2010-01-05 107.26521 ACG
4: 2010-01-06 119.70395 ACG
5: 2010-01-07 183.24604 ACG
---                         
3279: 2017-12-27 102.10622 ZED
3280: 2017-12-28  94.97718 ZED
3281: 2017-12-29 131.47358 ZED
3282: 2017-12-30 112.83836 ZED
3283: 2017-12-31 184.54966 ZED

我使用的方法在一个较小的数据集上运行良好,但我有一百多个ID,其中一些有20年的数据。我需要一个数据表的程序化方法

dt[, .SD[year(Date) %in% unique(year(Date))[1:3]], by = ID]

dt[, .SD[year(Date) <= unique(year(Date))[3]], by = ID]

确保先按日期订购。

最新更新