对于同一个人,我有两个具有不同数据的数据帧。数据帧1(dfx(具有唯一的id和人们约会的日期,数据帧2具有唯一的ID和开始和结束日期。
它看起来像下面这样:
c1 <- c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2")
d1 <- c("2017", "2018", "2019", "2020", "2021", "2019", "2019", "2019", "2020", "2021")
dfx <- data.frame(c1,d1)
c2 <- c("1", "1", "2")
ds <- c("2017", "2020", "2017")
de <- c("2018", "2021", "2018")
dfy <- data.frame(c2,ds,de)
我正在处理数据帧2,我想知道数据帧1中的日期是否在数据帧2的开始日期和结束日期内。我试图在dfy中得到一个输出,表示重叠为TRUE或FALSE。
对于本例,输出应返回TRUE、TRUE和FALSE。
我试过在dplyr上使用这个,但没有得到我想要的结果。我将感谢任何帮助。谢谢
dplyr代码:
overlap <- dfy %>%
group_by(c2) %>%
mutate (on_hold = any(mapply(function(id, start, end) any(id == dfx$c1 & dfx$d1 > start & dfx$d1 < end), c2, ds, de))) %>%
arrange(c2, ds, de, on_hold)
解决方案
ranges <- dfx %>%
group_by(c1) %>%
summarise(range = list(unique(d1)))
left_join(dfy, ranges, by = c("c2" = "c1")) %>%
rowwise() %>%
mutate(in_range = ds %in% range & de %in% range)
输出
# A tibble: 3 x 5
# Rowwise:
c2 ds de range in_range
<fct> <fct> <fct> <list> <lgl>
1 1 2017 2018 <fct [5]> TRUE
2 1 2020 2021 <fct [5]> TRUE
3 2 2017 2018 <fct [3]> FALSE
OP提供的数据
c1 <- c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2")
d1 <- c("2017", "2018", "2019", "2020", "2021", "2019", "2019", "2019", "2020", "2021")
dfx <- data.frame(c1,d1)
c2 <- c("1", "1", "2")
ds <- c("2017", "2020", "2017")
de <- c("2018", "2021", "2018")
dfy <- data.frame(c2,ds,de)