我的目标是根据2个Date列的条件创建一个列。数据集看起来像这样:
df <- data.frame(PatientID = c("3454","345","5","345","567","79"),
date_of_covid_test = c(2020-04-02, 2000-03-01, 2000-01-01, 2020-11-03, 2020-04-02, 2020-12-05),
date_of_hospitalization = c(2020-03-27, 2000-03-25, 2000-03-01, 2020-03-10, NA, NA), stringsAsFactors = F)
我要创建的新列名为"hospitalized_due_to_covid">. 以住院时间("date_of_hospitalization")("date_of_covid_test"),检查后1个月
如果存在NA
,那么结果将是FALSE
。
我在这里发布的示例的结果是:
hospitalized_due_to_covid = c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE)
我怎么写这个?
非常感谢你提前!!:)
您可以尝试:
library(lubridate)
library(dplyr)
df %>%
mutate(across(c(date_of_covid_test, date_of_hospitalization), as.Date),
hospitalized_due_to_covid = date_of_hospitalization >= (date_of_covid_test - 7) &
date_of_hospitalization <= (date_of_covid_test %m+% months(1)),
hospitalized_due_to_covid = replace(hospitalized_due_to_covid, is.na(hospitalized_due_to_covid), FALSE))
# PatientID date_of_covid_test date_of_hospitalization hospitalized_due_to_covid
#1 3454 2020-04-02 2020-03-27 TRUE
#2 345 2000-03-01 2000-03-25 TRUE
#3 5 2000-01-01 2000-03-01 FALSE
#4 345 2020-11-03 2020-03-10 FALSE
#5 567 2020-04-02 <NA> FALSE
#6 79 2020-12-05 <NA> FALSE
你的数据像:
df <- data.frame(PatientID = c("3454","345","5","345","567","79"),
date_of_covid_test = c("2020-04-02", "2000-03-01", "2000-01-01", "2020-11-03", "2020-04-02", "2020-12-05"),
date_of_hospitalization = c("2020-03-27", "2000-03-25", "2000-03-01", "2020-03-10", NA, NA), stringsAsFactors = F)