在R中组合满足条件标准的两个数据帧



我有数据帧A和数据帧B,数据帧A具有时间上的事件,数据帧B具有患者的事件范围。如果时间上的事件在数据帧B的日期范围之间而不是,我只想包括数据帧A中的行。如果数据帧A的患者不存在于数据帧B中,则将事件添加到数据帧B。

由于数据帧不相同,从数据帧A添加到数据帧B的行应该添加date=start和date=end的行。

我试图弄清楚如何将其用于dplyr,但它似乎很复杂。我设法让它与for-loop一起工作,但就我的教育而言,我想知道其他人如何完成同样的任务

dfa <- data.frame(
date = c("2021-01-01", "2021-02-02", "2021-02-05"),
patient = c("one", "two", "three"))
dfb <- data.frame(
start = c("2020-12-31", "2021-02-01"),
end = c("2021-01-02", "2021-02-03"),
patient = c("one", "one"))
dfa$date <- as.Date(dfa$date, "%Y-%m-%d")
dfb$start <- as.Date(dfb$start, "%Y-%m-%d")
dfb$end <- as.Date(dfb$end, "%Y-%m-%d")
for (i in 1:nrow(dfa)) {
date <- dfa[i, "date"]
d_patient <- dfa[i, "patient"]
res <- dfb[d_patient == dfb$patient &
date >= dfb$start &
date <= dfb$end,]
if (nrow(res) == 0) {
tf <- data.frame("start" = date,
"end" = date,
"patient" = d_patient)
dfb <- rbind(dfb, tf)
}
}
print(dfb)

结果:

start        end patient
1 2020-12-31 2021-01-02     one
2 2021-02-01 2021-02-03     one
3 2021-02-02 2021-02-02     two
4 2021-02-05 2021-02-05   three
dfa <- data.frame(
date = c("2021-01-01", "2021-02-02", "2021-02-05"),
patient = c("one", "two", "three"))
dfb <- data.frame(
start = c("2020-12-31", "2021-02-01"),
end = c("2021-01-02", "2021-02-03"),
patient = c("one", "one"))
dfa$date <- as.Date(dfa$date, "%Y-%m-%d")
dfb$start <- as.Date(dfb$start, "%Y-%m-%d")
dfb$end <- as.Date(dfb$end, "%Y-%m-%d")
dfa
#>         date patient
#> 1 2021-01-01     one
#> 2 2021-02-02     two
#> 3 2021-02-05   three
dfb
#>        start        end patient
#> 1 2020-12-31 2021-01-02     one
#> 2 2021-02-01 2021-02-03     one
library(tidyverse)
library(fuzzyjoin)
fuzzy_anti_join(
x = dfa, 
y = dfb, 
by = c("patient", "date" = "start", "date" = "end"),
match_fun = list(`==`, `>=`, `<=`)
) %>% 
transmute(patient, start = date, end = date) %>% 
bind_rows(dfb)
#>   patient      start        end
#> 1     two 2021-02-02 2021-02-02
#> 2   three 2021-02-05 2021-02-05
#> 3     one 2020-12-31 2021-01-02
#> 4     one 2021-02-01 2021-02-03

创建于2022-01-22由reprex包(v2.0.1(

数据.表

library(magrittr)
library(data.table)
setDT(dfa)
setDT(dfb)
tmp <- dfa[!dfb, on = list(patient, date >= start, date <= end)] %>% 
.[, `:=`(start = date, end = date, date = NULL)]
l <- list(tmp, dfb)
rbindlist(l = l, use.names = TRUE)
#>    patient      start        end
#> 1:     two 2021-02-02 2021-02-02
#> 2:   three 2021-02-05 2021-02-05
#> 3:     one 2020-12-31 2021-01-02
#> 4:     one 2021-02-01 2021-02-03

创建于2022-01-22由reprex包(v2.0.1(

最新更新