r语言 - 如何从第一个观察中获取结束日期，并将其用作同一ID的第二个观察的开始日期? - r - How to get the ending date from the first observation and use it as the starting date for the second observation for the same ID? 小贝子编程网

我的df有一些唯一和一些双条目和列，显示每个观察的开始和结束日期，但它们不能重叠相同的id。


df <- data.frame(id = c(22,22,102,102,102),
start_date = as.Date(c("2013-10-29","2014-01-09",
"2016-09-14",
"2016-09-14","2016-09-14")), 
end_date = as.Date(c("2017-08-15","2018-10-05",
"2016-10-09",
"2017-12-12","2018-10-17")))
head(df)
id start_date   end_date
1  22 2013-10-29 2017-08-15
2  22 2014-01-09 2018-10-05
3 102 2016-09-14 2016-10-09
4 102 2016-09-14 2017-12-12
5 102 2016-09-14 2018-10-17

IDS 22 和 102 的日期间隔重叠，但 22 具有不同的start_date和 102 具有相同的start_date。

我需要的结果是：

当日期重叠时，将上一个观测值的最终日期作为开始日期。
当日期不重叠时，请保留实际值。

有什么想法或建议吗？

我期望的结果是：

head(fixed_df)
id start_date   end_date
1  22 2013-10-29 2017-08-15
2  22 2017-08-15 2018-10-05
3 102 2016-09-14 2016-10-09
4 102 2016-10-09 2017-12-12
5 102 2017-12-12 2018-10-17

在 R 中，您可以轻松地将日期对象与普通的 ==、> 或 <运算符进行比较，因此通过使用循环和少量测试，这里有一个有效的解决方案：>

#Loop over every lines except the last one
for (line in c(1:(length(df$id)-1)))
{
#Do something only if next line have the same ID
if(df$id[line]==df$id[line+1])
{
#Check if end date is after start date of the next line
if(df$end_date[line]>df$start_date[line+1])
{
#If yes, put the start date of next line to end date of current line
df$start_date[line+1]=df$end_date[line]
}
}

}

有了dplyr，我会这样做：

library(dplyr)
df %>% group_by(id) %>%
arrange(start_date) %>%
mutate(
lag(end_date),
overlap = start_date < lag(end_date, default=as.Date('2000-01-01')),
new_start_date = if_else(overlap, lag(end_date), start_date)
)
id start_date end_date   `lag(end_date)` overlap new_start_date
<dbl> <date>     <date>     <date>          <lgl>   <date>        
1    22 2013-10-29 2017-08-15 NA              FALSE   2013-10-29    
2    22 2014-01-09 2018-10-05 2017-08-15      TRUE    2017-08-15    
3   102 2016-09-14 2016-10-09 NA              FALSE   2016-09-14    
4   102 2016-09-14 2017-12-12 2016-10-09      TRUE    2016-10-09    
5   102 2016-09-14 2018-10-17 2017-12-12      TRUE    2017-12-12

这个很冗长，但只是为了演示正在发生的事情。

一些关键点：

使用group_by将比较保持在id以内。
接下来，对事物进行排序。
lag- 与以前的值进行比较。但是使用一个好的默认值，这也是相同的类型。

如果您想要严格的无重叠，请考虑使用lag(end_date) + days(1)。

r语言 - 如何从第一个观察中获取结束日期，并将其用作同一ID的第二个观察的开始日期?

相关内容

最新更新

热门标签：