Date第二行和最后一行之间的差值



我尝试计算每个组id的第二行和最后一行之间的日期差。数据看起来像

data<- data.frame(pid= c(1, 1, 1,1, 2, 2, 2, 3, 3, 3,3 ,3), day = c("25/07/2018", "19/10/2018", "17/01/2019", "19/03/2019", "10/09/2018","29/11/2018", "26/03/2019", "17/06/2016", "25/04/2018", "17/07/2018","05/04/2019", "09/02/2021"), catt=c(1,1,2,1,1,1,2,2,2,1,1,2))

数据
<表类>pid天tbody><<tr>1125/07/20182119/10/20183117/01/20194119/03/20195210/09/20186229/11/20187226/03/20198317/06/20169325/04/201810317/07/201811305/04/201912309/02/2021

转换为日期对象并计算每个pid最后和第二个日期的差值

library(dplyr)
library(lubridate)
data %>%
mutate(day = dmy(day)) %>%
arrange(pid, day) %>%
group_by(pid) %>%
summarise(difference = (last(day) - day[2])/30)
#   pid difference
#  <dbl>      <dbl>
#1     1       5.03
#2     2       3.9 
#3     3      34.0 

如果你想保持数据框的行数,使用mutate,只替换数据框最后一行的difference

data %>%
mutate(day = dmy(day)) %>%
arrange(pid, day) %>%
group_by(pid) %>%
mutate(difference = ifelse(row_number() == n(), (last(day) - day[2])/30, NA))

注意问题中difftime的输出不正确。

#Wrong output
difftime("19/10/2018","19/03/2019 ", units = "days")
#Time difference of 214 days
#Correct output
difftime(dmy("19/03/2019"), dmy("19/10/2018"), units = "days")
#Time difference of 151 days

最新更新