r语言 - 以开始和结束日期为条件的周期时间总和



我尝试为每一天、每周和每月构造列x的总和。如果特定的日期、周或月在开始日期和结束日期之间,我想包含数字x并将它们相加。

我构造了这个示例数据框架:

library(data.table)
library(lubridate)
df <- data.frame(x=c(13,32,37,21,9,43,12,28),
start=c('2018-06-12','2019-02-12','2018-12-30','2020-02-05','2019-09-29','2017-05-19','2019-06-13','2020-04-12'), 
end=c('2018-09-13','2019-03-19','2020-01-10','2020-03-17','2020-10-10','2020-01-02','2019-07-19','2021-06-01'))
#convert columns as date
df$start <- as.Date(df$start,"%Y-%m-%d")
df$end <- as.Date(df$end,"%Y-%m-%d")

我试着每天做一个for循环,总结每个特定时期的x列,但我做不到。

#for loop over days
days <- seq(from=as.Date("2017-01-01"), to=as.Date("2021-07-31"), by="days")
for (i in seq_along(days)){
print(sum(df$x))}

非常感谢你的帮助:)

您可以展开每一行的startend日期,并用它创建一个新行。对于每个日期,您可以sumx值。我们使用complete来填补缺失的日期,如果它们存在。

library(tidyverse)
df %>%
mutate(dates = map2(start, end, seq, by = 'days')) %>%
unnest(dates) %>%
group_by(dates) %>%
summarise(x = sum(x)) %>%
complete(dates = seq(min(dates), max(dates), by = 'days'), fill = list(x = 0)) 
#   dates          x
#   <date>     <dbl>
# 1 2017-05-19    43
# 2 2017-05-20    43
# 3 2017-05-21    43
# 4 2017-05-22    43
# 5 2017-05-23    43
# 6 2017-05-24    43
# 7 2017-05-25    43
# 8 2017-05-26    43
# 9 2017-05-27    43
#10 2017-05-28    43
# … with 1,465 more rows

天试试这个:

library(data.table)
library(lubridate)
library(dplyr)
df <- df %>% 
mutate(start = as.Date(start),
end = as.Date(end)) %>% ## convert columns as date
as.data.table() ## convert frame to table

days <- seq(from=as.Date("2017-01-01"), to=as.Date("2021-07-31"), by="days")
total <- 0
for (day in days) {
total <- total + df[start <= day & end >= day, sum(x)]
}

out:
> print(total)
[1] 72784

将每天的结果存储在表中:

days <- seq(from=as.Date("2017-01-01"), to=as.Date("2021-07-31"), by="days")
tab_results <- data.table(Date = as.Date(character()), 
x = as.integer() )
for (day in days) {
tab_results <- tab_results %>% add_row(Date = as.Date(day, origin = "1970-01-01"), 
x = df[start <= day & end >= day, sum(x)])
}
数据:

df <- data.frame(x=c(13,32,37,21,9,43,12,28),
start=c('2018-06-12','2019-02-12','2018-12-30','2020-02-05','2019-09-29','2017-05-19','2019-06-13','2020-04-12'), 
end=c('2018-09-13','2018-03-19','2020-01-10','2020-03-17','2020-10-10','2020-01-02','2019-07-19','2021-06-01'))

最新更新