我有一个很长的列表,需要日期之间的天数
ClientID <- c("00932", "00932", "00932")
Visit <- c("2018-11-10", "2018-11-20", "2018-11-25")
ClientID Visit
00932 2018-11-10
00932 2018-11-20
00932 2018-11-25
我需要一个新的列,写
ClientID Visit Days
00932 2018-11-10 0
00932 2018-11-20 10
00932 2018-11-25 15
将Visit
更改为日期类,并为每个ClientID
减去Visit
与Visit
的最小日期。
library(dplyr)
df %>%
mutate(Visit = as.Date(Visit, '%m-%d-%Y')) %>%
group_by(ClientID) %>%
mutate(Days = as.integer(Visit - min(Visit))) %>%
ungroup
# ClientID Visit Days
# <chr> <date> <int>
#1 00932 2018-11-10 0
#2 00932 2018-11-20 10
#3 00932 2018-11-25 15
ClientID <- c("00932", "00932", "00932")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018")
df <- data.frame(ClientID, Visit)
考虑到您有多个ClientID
,并希望在该级别上计算Days
:
library(lubridate)
library(tidyverse)
ClientID <- c("00932", "00932", "00932")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018")
df <- data.frame(ClientID, Visit)
df %>%
group_by(ClientID) %>%
mutate(Visit= mdy(Visit),
Days= as.numeric(Visit-lag(Visit)))%>%
ungroup()%>%
mutate_if(is.numeric, ~replace_na(., 0))
# A tibble: 3 x 3
ClientID Visit Days
<chr> <date> <dbl>
1 00932 2018-11-10 0
2 00932 2018-11-20 10
3 00932 2018-11-25 5
添加另一个ClientID
和两个观察值来更好地演示它:
ClientID <- c("00932", "00932", "00932", "00935", "00935")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018", "11-20-2019", "11-25-2019")
df <- data.frame(ClientID, Visit)
df %>%
group_by(ClientID) %>%
mutate(Visit= mdy(Visit),
Days= as.numeric(Visit-lag(Visit)))%>%
ungroup()%>%
mutate_if(is.numeric, ~replace_na(., 0))
# A tibble: 5 x 3
ClientID Visit Days
<chr> <date> <dbl>
1 00932 2018-11-10 0
2 00932 2018-11-20 10
3 00932 2018-11-25 5
4 00935 2019-11-20 0
5 00935 2019-11-25 5
按注释中的要求添加函数:
days_func <- function(df){
df %>%
group_by(ClientID) %>%
mutate(Visit= mdy(Visit),
Days= as.numeric(Visit-lag(Visit)))%>%
ungroup()%>%
mutate_if(is.numeric, ~replace_na(., 0))->df
return(df)
}
df1 <- days_func(df)
df1
# A tibble: 3 x 3
ClientID Visit Days
<chr> <date> <dbl>
1 00932 2018-11-10 0
2 00932 2018-11-20 10
3 00932 2018-11-25 5
我假设您需要日期之间的日跨度,而不是日期和最近日期之间的日跨度,我建议这样做:
dn <- as.numeric(as.Date(Visit))
带日期的文本一致地转换为日期和数字。
dn2 <- c(dn[1], dn[-length(dn)])
我们准备第二个向量进行减法,因为它将快速工作。它的成员顺序如下:
df df2
1日1日2日1日
3日2n nth-1
Days <- dn - dn2
查找span