r崩溃时间线由ID分隔31天



这个问题类似于这里的问题按ID 按年份划分的r崩溃

然而,我喜欢按ID和状态折叠时间表,前提是它们之间的时间间隔为31天。如果差距超过31天,那么他们没有崩溃,而是从新的一排开始。例如,如果这是我的数据集

ID     From           To           State
1      2004-04-05     2005-02-05   MD
1      2005-03-05     2005-03-05   MD
1      2005-04-05     2005-10-05   DC
1      2006-03-05     2006-10-05   DC
1      2006-11-05     2007-03-05   DC
1      2007-04-05     2007-06-05   MD
1      2008-03-05     2008-11-05   MD
1      2008-12-05     2010-08-05   MD
1      2010-09-05     2012-11-05   MD
2      2003-05-05     2004-08-05   OR
2      2004-09-05     2009-03-05   OR
2      2010-06-05     2010-08-05   AZ
2      2013-06-05     2015-06-05   AZ

折叠后的最终数据集看起来像这个

ID     From           To           State
1      2004-04-05     2005-03-05   MD
1      2005-04-05     2005-10-05   DC
1      2006-04-05     2007-03-05   DC
1      2007-04-05     2007-06-05   MD
1      2008-03-05     2012-11-05   MD
2      2003-05-05     2009-03-05   OR
2      2010-06-05     2010-08-05   AZ
2      2013-06-05     2015-06-05   AZ

如有任何建议,我们将不胜感激。

测试用例2:

ID     From           To           State
1      2003-09-05     2003-11-05   MD
1      2004-09-05     2007-05-05   TX
1      2007-06-05     2007-07-05   DC
1      2007-08-05     2009-07-05   DC
1      2011-11-05     2014-03-05   MD
1      2014-05-05     2017-06-05   MD

预期结果

ID     From           To           State
1      2003-09-05     2003-11-05   MD
1      2004-09-05     2007-05-05   TX
1      2007-06-05     2009-07-05   DC 
1      2011-11-05     2017-06-05   MD

从上一个To日期减去当前From日期,创建一个新的分组列,并在每个组中选择firstFrom值和lastTo值。

library(dplyr)
df %>%
mutate(across(c(From, To), as.Date)) %>%
group_by(ID, State, 
group = cumsum(From - dplyr::lag(To, default = as.Date('1970-01-01')) > 31)) %>%
summarise(From = first(From), 
To = last(To), .groups = 'drop') %>%
select(-group) %>%
arrange(ID, From)
#     ID State From       To        
#  <int> <chr> <date>     <date>    
#1     1 MD    2004-04-05 2005-03-05
#2     1 DC    2005-04-05 2005-10-05
#3     1 DC    2006-03-05 2007-03-05
#4     1 MD    2007-04-05 2007-06-05
#5     1 MD    2008-03-05 2012-11-05
#6     2 OR    2003-05-05 2009-03-05
#7     2 AZ    2010-06-05 2010-08-05
#8     2 AZ    2013-06-05 2015-06-05

最新更新