r语言 - 将每日数据转换为每周数据并处理假期



我有一个包含每日数据的数据表。从这个数据表中,我想提取每周三获得的每周数据点。如果星期三是假期,即数据表中不可用,则应获取下一个可用数据点。 这里有一个MWE:

library(data.table)
df <- data.table(date=as.Date(c("2012-06-25","2012-06-26","2012-06-27","2012-06-28","2012-06-29","2012-07-02","2012-07-03","2012-07-05","2012-07-06","2012-07-09","2012-07-10","2012-07-11","2012-07-12","2012-07-13","2012-07-16","2012-07-17","2012-07-18","2012-07-19","2012-07-20")))
df[,weekday:=strftime(date,'%u')]

带输出:

date  weekday
1: 2012-06-25       1
2: 2012-06-26       2
3: 2012-06-27       3
4: 2012-06-28       4
5: 2012-06-29       5
6: 2012-07-02       1
7: 2012-07-03       2
8: 2012-07-05       4 #here the 4th of July was skipped
9: 2012-07-06       5
10: 2012-07-09       1
11: 2012-07-10       2
12: 2012-07-11       3
13: 2012-07-12       4
14: 2012-07-13       5
15: 2012-07-16       1
16: 2012-07-17       2
17: 2012-07-18       3
18: 2012-07-19       4
19: 2012-07-20       5

在这种情况下,我想要的结果是:

date  weekday
2012-06-27       3
2012-07-05       4
2012-07-11       3
2012-07-18       3

有没有比通过 for 循环逐周检查周三数据点是否包含在数据中更有效的方法?我觉得一定有更好的方法,所以任何建议将不胜感激!

工作解决方案(根据Imo的建议):

df[,weekday:=wday(date)] #faster way to get weekdays, careful: numbers increased by 1 vs strftime
df[,numweek:=floor(as.numeric(date-date[1])/7+1)] #get continuous week numbers extending over end of years
df[df[,.I[which.min(abs(weekday-4.25))],by=.(numweek)]$V1] #gets result

这里有一种方法在 data.table 上使用连接,该方法按周使用查找最接近 3 的值(即不是 2,使用which.min(abs(as.integer(weekday)-3.25)))的位置(使用.I)。

df[df[, .I[which.min(abs(as.integer(weekday)-3.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27       3
2: 2012-07-05       4
3: 2012-07-11       3
4: 2012-07-18       3

请注意,如果您的真实数据跨越数年,则需要使用by=.(week(date), year(date)).


另请注意,wday有一个data.table函数,它将直接返回一周中的整数日期。它比strftime返回的字符整数值大 1,因此如果要直接使用它,则需要进行调整。

从具有单个变量的data.table中,您可以

df[, weekday := wday(date)]
df[df[, .I[which.min(abs(weekday-4.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27       4
2: 2012-07-05       5
3: 2012-07-11       4
4: 2012-07-18       4

请注意,日期与上述日期一致。

最新更新