在R中,如何根据组(id)计算两列日期之间的差异,同时保持第一个可用日期作为参考



如何计算两列日期之间的时间,但按组保留第一个或最早的日期作为参考。例如idN02,参考date_1应该保持2009-07-10直到下一个id。我想我很接近了,但是我不能成功地找到正确的解决方案。

请在下面找到一个最小的工作示例:

id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
df1 <- data.frame (id, date_1, date_2)
> df1
id     date_1     date_2
1  N02 2008-03-15 2008-03-15
2  N02 2008-04-15 2008-04-15
3  N03 2008-06-15 2008-06-15
4  N03 2008-07-15 2008-07-15
5  N04 2009-07-10 2009-07-10
6  N04 2009-07-13 2009-07-13
7  N04 2009-07-15 2009-07-15
8  N04 2009-07-16 2009-07-16
9  N04 2009-07-17 2009-07-17
10 N04 2009-07-20 2009-07-20

我的尝试失败了:

df2 <- df1 %>% group_by (id) %>% mutate (diff = difftime (date_2, lag (date_1, default = date_1[1]), unit = "day"))
> df2
# A tibble: 10 × 4
# Groups:   id [3]
id    date_1     date_2     diff         
<chr> <chr>      <chr>      <drtn>       
1 N02   2008-03-15 2008-03-15  0.00000 days
2 N02   2008-04-15 2008-04-15 30.95833 days
3 N03   2008-06-15 2008-06-15  0.00000 days
4 N03   2008-07-15 2008-07-15 30.00000 days
5 N04   2009-07-10 2009-07-10  0.00000 days
6 N04   2009-07-13 2009-07-13  3.00000 days
7 N04   2009-07-15 2009-07-15  2.00000 days
8 N04   2009-07-16 2009-07-16  1.00000 days
9 N04   2009-07-17 2009-07-17  1.00000 days
10 N04   2009-07-20 2009-07-20  3.00000 days

但是我想要这样的东西:

id <- c("N02", "N02", "N03", "N03", "N04", "N04", "N04", "N04", "N04", "N04")
date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2009-07-10", "2009-07-13", "2009-07-15", "2009-07-16", "2009-07-17", "2009-07-20")
diff <- c("0.00000 days", "30.95833 days", "0.00000 days", "30.00000 days", "0.00000 days", "3.00000 days", "5.00000 days", "6.00000 days", "7.00000 days", "10.0000 days")
df2 <- data.frame (id, date_1, date_2, diff)
> df2
id     date_1     date_2          diff
1  N02 2008-03-15 2008-03-15  0.00000 days
2  N02 2008-04-15 2008-04-15 30.95833 days
3  N03 2008-06-15 2008-06-15  0.00000 days
4  N03 2008-07-15 2008-07-15 30.00000 days
5  N04 2009-07-10 2009-07-10  0.00000 days
6  N04 2009-07-13 2009-07-13  3.00000 days
7  N04 2009-07-15 2009-07-15  5.00000 days
8  N04 2009-07-16 2009-07-16  6.00000 days
9  N04 2009-07-17 2009-07-17  7.00000 days
10 N04 2009-07-20 2009-07-20  10.0000 days

提前感谢你的帮助。查尔斯。

你就快成功了-只要用[[1]](或dplyr::first())代替lag():

library(dplyr)
df1 %>%
group_by(id) %>%
mutate(diff = difftime(date_2, date_1[[1]], unit = "day")) %>%
ungroup()
# A tibble: 10 × 4
id    date_1     date_2     diff   
<chr> <chr>      <chr>      <drtn> 
1 N02   2008-03-15 2008-03-15  0 days
2 N02   2008-04-15 2008-04-15 31 days
3 N03   2008-06-15 2008-06-15  0 days
4 N03   2008-07-15 2008-07-15 30 days
5 N04   2009-07-10 2009-07-10  0 days
6 N04   2009-07-13 2009-07-13  3 days
7 N04   2009-07-15 2009-07-15  5 days
8 N04   2009-07-16 2009-07-16  6 days
9 N04   2009-07-17 2009-07-17  7 days
10 N04   2009-07-20 2009-07-20 10 days

最新更新