我有一个df,指示某个观察的开始和结束日期。这种观察通常持续一天以上,因此其值为>在"0"中;持续时间";柱我想加上介于";"开始";以及";结束";("duration"(作为新行输入到我的df中。我该怎么做?
示例df
df <- data.frame(start_date = c(as.Date("1/1/2020", "1/25/2020", "2/11/2020")),
end_date = c(as.Date("1/5/2020", "1/26/2020", "2/13/2020")),
duration = c(4, 1, 2))
您正在寻找这样的解决方案吗?
library(dplyr)
library(lubridate)
df %>%
mutate(start_date = mdy(start_date),
end_date = mdy(end_date)) %>%
mutate(duration = end_date - start_date)
数据:
df <- data.frame(start_date = c("1/1/2020", "1/25/2020", "2/11/2020"),
end_date = c("1/5/2020", "1/26/2020", "2/13/2020"))
输出:
start_date end_date duration
1 2020-01-01 2020-01-05 4 days
2 2020-01-25 2020-01-26 1 days
3 2020-02-11 2020-02-13 2 day
您可以简单地从df$end_date
:中减去df$start_date
df$end_date - df$start_date
#Time differences in days
#[1] 4 1 2
或使用difftime
:
difftime(df$end_date, df$start_date, "days")
#Time differences in days
#[1] 4 1 2
要获取日期序列,请使用seq
:
do.call(c, Map(seq, df$start_date, df$end_date, by=1))
# [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
# [6] "2020-01-25" "2020-01-26" "2020-02-11" "2020-02-12" "2020-02-13"
数据:
df <- data.frame(start_date = as.Date(c("1/1/2020", "1/25/2020", "2/11/2020"), "%m/%d/%y"),
end_date = as.Date(c("1/5/2020", "1/26/2020", "2/13/2020"), "%m/%d/%y"),
duration = c(4, 1, 2))
您正在寻找此解决方案吗?
library(tidyverse)
df %>%
mutate(date = map2(start_date, end_date, seq, by = '1 day')) %>%
unnest(date) -> result
result
# start_date end_date duration date
# <date> <date> <dbl> <date>
# 1 2020-01-01 2020-01-05 4 2020-01-01
# 2 2020-01-01 2020-01-05 4 2020-01-02
# 3 2020-01-01 2020-01-05 4 2020-01-03
# 4 2020-01-01 2020-01-05 4 2020-01-04
# 5 2020-01-01 2020-01-05 4 2020-01-05
# 6 2020-01-25 2020-01-26 1 2020-01-25
# 7 2020-01-25 2020-01-26 1 2020-01-26
# 8 2020-02-11 2020-02-13 2 2020-02-11
# 9 2020-02-11 2020-02-13 2 2020-02-12
#10 2020-02-11 2020-02-13 2 2020-02-13
您可以使用select
删除不需要的列。
数据
df <- structure(list(start_date = structure(c(18262, 18286, 18303),class = "Date"),
end_date = structure(c(18266, 18287, 18305), class = "Date"),
duration = c(4, 1, 2)), class = "data.frame", row.names = c(NA, -3L))