>假设我们有 2 个数据帧,每个数据帧有 2 个 col 和 6 行,并且我们只想在左侧的日期 (lhs( 早于右侧的日期 (rhs( 时绑定这两个数据帧,同时确保每一行没有重复的日期(在 LHS 和 rhs 中(:例如。
x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )
我们如何仅在 lhs 日期
我不确定我是否理解您的问题,但您是否正在寻找这样的东西?
# packages
library(fuzzyjoin)
# data
x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))
# left join
fuzzy_left_join(
x = x,
y = y,
by = c("lhs_date" = "rhs_date"),
match_fun = list(`<`)
)
#> lhs_date letter.x rhs_date letter.y
#> 1 2019-11-21 A 2019-11-22 Y
#> 2 2019-11-21 A 2019-11-23 Y
#> 3 2019-11-21 A 2019-11-24 X
#> 4 2019-11-21 A 2019-11-25 J
#> 5 2019-11-21 A 2019-11-26 J
#> 6 2019-11-23 B 2019-11-24 X
#> 7 2019-11-23 B 2019-11-25 J
#> 8 2019-11-23 B 2019-11-26 J
#> 9 2019-11-25 C 2019-11-26 J
#> 10 2019-11-21 D 2019-11-22 Y
#> 11 2019-11-21 D 2019-11-23 Y
#> 12 2019-11-21 D 2019-11-24 X
#> 13 2019-11-21 D 2019-11-25 J
#> 14 2019-11-21 D 2019-11-26 J
#> 15 2019-11-23 E 2019-11-24 X
#> 16 2019-11-23 E 2019-11-25 J
#> 17 2019-11-23 E 2019-11-26 J
#> 18 2019-11-25 F 2019-11-26 J
创建于 2019-11-26 由 reprex 软件包 (v0.3.0(
我找到的解决方案基于初始 agila 输入: 模糊连接后 DPLYR 管道操作员可以完成剩下的工作:
x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))
z= fuzzy_left_join(
x = x,
y = y,
by = c("lhs_date" = "rhs_date"),
match_fun = list(`<`)
)
z %>%
group_by(lhs_date) %>%
mutate(flag = row_number()) %>%
filter(flag ==1)
我可以轻松地在SQL中重现,但是在@Agila R上挣扎。虽然不完整,但你的回答指出了正确的方向,而且走得很远
由于 'dplyr' 现在已经在变异连接中获得了joined_by
论据,我们现在可以用dplyr
x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x |>
left_join(Y, by = join_by(closest(lhs_date < rhs_date)))
#> lhs_date letter.x rhs_date letter.y
#> 1 2023-03-24 A 2023-03-25 Y
#> 2 2023-03-26 B 2023-03-27 X
#> 3 2023-03-28 C 2023-03-29 J
#> 4 2023-03-24 D 2023-03-25 Y
#> 5 2023-03-26 E 2023-03-27 X
#> 6 2023-03-28 F 2023-03-29 J
创建于 2023-03-29 与 reprex v2.0.2