从R中多个日期范围之间的列表中快速计算天数的方法



我搜索了一段时间,但似乎大多数天数计数方法只针对两个日期(即单个日期范围(。我不知道是否有比使用for循环计算多个日期范围之间的天数更快的方法。

这是我的数据DF:

ItemName  StartDate    EndDate
Apple     2014-01-02   2018-01-01
Orange    2014-08-06   2017-02-03
Banana    2015-03-01   2019-12-31
...

现在我有一个报告日期范围:2015-01-01至2019-12-31。我想按季度计算报告日期所涵盖项目的天数。

例如,第一季度的报告日期在2015-01-01&2015-03-31(含两端(

ItemName  StartDate    EndDate      CountDays  SampleQuarter
Apple     2014-01-02   2018-01-02   90         2015-Q1
Orange    2014-08-06   2017-02-03   90         2015-Q1
Banana    2015-03-01   2019-12-31   31         2015-Q1
...
Apple     2014-01-02   2018-01-02    2         2018-Q1
Orange    2014-08-06   2017-02-03    0         2018-Q1
Banana    2015-03-01   2019-12-31   31         2018-Q1
....

我的目标是使用上面的结果来得到这个表:

SampleQuarter  TotalDays
2015-Q1        211
2015-Q2        273
...
2018-Q1         33
2018-Q2         91
... 

我使用for循环来处理它。首先我创建了一个参考日期列表:

Date = c("2015-01-01","2015-04-01","2015-06-01",....)
TotalCount = NULL
for(i in 1:length(Date)){
START = as.Date(Date[i])
END = START %m+% months(3) -1 ## the end of each quarter 

## find items that start and end date range is covered by the selected quarter
## than calculated the days covered by the selected quarter only  
df = DF %>% filter( StartDate <=END & EndDate >=END )   %>%
mutate(StartDateNEW = ifelse(StartDate <= START,START, StartDate ))%>%
mutate(EndDateNEW = ifelse(EndDate <= END,EndDate , END)) %>% 
mutate(CountDays= EndDateNEW -StartDateNEW +1) %>% 
select(-StartDateNEW, EndDateNEW ) %>% 
mutate(SampleQuarter =paste0( format(START , "%Y") ,"-Q",quarter(START) )  %>%
group_by(SampleQuarter) %>% summarise(TotalDays = sum(CountDays)) 

TotalCount = rbind(TotalCount ,df)
START = START %m+% months(3) ## the beginning of next quarter 
}

对于小样本数据,For循环运行良好。当样本量很大时,代码将需要一段时间。我想知道是否有一种方法可以绕过for循环,使整个过程更快?

提前谢谢。

也许我们可以使用

library(dplyr)
library(purrr)
library(tidyr)
library(zoo)
DF %>% 
mutate(across(ends_with("Date"), as.Date),
new = map2(StartDate, EndDate,
~ tibble(days = seq(.x, .y, by = 'day'), 
Quarter = cut(days, breaks = seq(.x, .y, by = 'quarter'))))) %>% 
unnest(c(new)) %>% 
group_by(ItemName, StartDate, EndDate, Quarter = as.yearqtr(Quarter)) %>%
summarise(n = n(), .groups = 'drop')

最新更新