r语言 - 如何按组确定跨两列和两行的两个日期之间的天数差异?



我希望确定跨两列和两行组的天数差异。实际上,从结束日减去随后的开始日,并将差值记录为数据框架中的新列,并在确定新组(ID)时重新开始。

Start_Date   End_Date     ID   

2014-05-09   2015-05-08   01
2015-05-09   2016-05-08   01 
2016-05-11   2017-05-10   01
2017-05-11   2018-05-10   01
2016-08-29   2017-08-28   02
2017-08-29   2018-08-28   02

结果应该如下表所示。

Start_Date   End_Date     ID   Days_Difference 

2014-05-09   2015-05-08   01         NA
2015-05-09   2016-05-08   01         01
2016-05-11   2017-05-10   01         03
2017-05-11   2018-05-10   01         01
2016-08-29   2017-08-28   02         NA
2017-08-29   2018-08-28   02         01

本质上,我想取各组(ID)的结束日期与其左侧对角线开始日期之差。我现在真的很难受。我不认为我的代码会有帮助。任何使用数据的解决方案。表,或基础R将非常感激!

分组后,我们可以得到'Start_Date'和'End_Date'的lead(下一个元素)的差值

library(dplyr)
df1 <- df1 %>%
mutate(across(ends_with("Date"), as.Date)) %>%
group_by(ID) %>% 
mutate(Days_Difference = as.numeric(lag(lead(Start_Date) - End_Date))) %>% 
ungroup

与产出

df1
# A tibble: 6 × 4
Start_Date End_Date      ID Days_Difference
<date>     <date>     <int>           <dbl>
1 2014-05-09 2015-05-08     1              NA
2 2015-05-09 2016-05-08     1               1
3 2016-05-11 2017-05-10     1               3
4 2017-05-11 2018-05-10     1               1
5 2016-08-29 2017-08-28     2              NA
6 2017-08-29 2018-08-28     2               1

data.table

的类似逻辑
library(data.table)
setDT(df1)[, Days_Difference := 
as.numeric(shift(shift(as.IDate(Start_Date), type = "lead") - 
as.IDate(End_Date))), ID]

与产出

> df1
Start_Date   End_Date    ID Days_Difference
<char>     <char> <int>           <num>
1: 2014-05-09 2015-05-08     1              NA
2: 2015-05-09 2016-05-08     1               1
3: 2016-05-11 2017-05-10     1               3
4: 2017-05-11 2018-05-10     1               1
5: 2016-08-29 2017-08-28     2              NA
6: 2017-08-29 2018-08-28     2               1

数据
df1 <- structure(list(Start_Date = c("2014-05-09", "2015-05-09", "2016-05-11", 
"2017-05-11", "2016-08-29", "2017-08-29"), End_Date = c("2015-05-08", 
"2016-05-08", "2017-05-10", "2018-05-10", "2017-08-28", "2018-08-28"
), ID = c(1L, 1L, 1L, 1L, 2L, 2L)), class = "data.frame", 
row.names = c(NA, 
-6L))

另一个data.table选项

setDT(df)[
,
c(lapply(.SD, as.IDate), .(ID = ID)),
.SDcols = patterns("Date$")
][
,
DayspDiff := Start_Date - shift(End_Date),
ID
][]

收益率

Start_Date   End_Date ID DayspDiff
1: 2014-05-09 2015-05-08  1        NA
2: 2015-05-09 2016-05-08  1         1
3: 2016-05-11 2017-05-10  1         3
4: 2017-05-11 2018-05-10  1         1
5: 2016-08-29 2017-08-28  2        NA
6: 2017-08-29 2018-08-28  2         1

最新更新