r语言 - 日期上的条件 cbind()



>假设我们有 2 个数据帧,每个数据帧有 2 个 col 和 6 行,并且我们只想在左侧的日期 (lhs( 早于右侧的日期 (rhs( 时绑定这两个数据帧,同时确保每一行没有重复的日期(在 LHS 和 rhs 中(:例如。

x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )

我们如何仅在 lhs 日期

我不确定我是否理解您的问题,但您是否正在寻找这样的东西?

# packages
library(fuzzyjoin)
# data
x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))
# left join
fuzzy_left_join(
x = x, 
y = y, 
by = c("lhs_date" = "rhs_date"),
match_fun = list(`<`)
)
#>      lhs_date letter.x   rhs_date letter.y
#> 1  2019-11-21        A 2019-11-22        Y
#> 2  2019-11-21        A 2019-11-23        Y
#> 3  2019-11-21        A 2019-11-24        X
#> 4  2019-11-21        A 2019-11-25        J
#> 5  2019-11-21        A 2019-11-26        J
#> 6  2019-11-23        B 2019-11-24        X
#> 7  2019-11-23        B 2019-11-25        J
#> 8  2019-11-23        B 2019-11-26        J
#> 9  2019-11-25        C 2019-11-26        J
#> 10 2019-11-21        D 2019-11-22        Y
#> 11 2019-11-21        D 2019-11-23        Y
#> 12 2019-11-21        D 2019-11-24        X
#> 13 2019-11-21        D 2019-11-25        J
#> 14 2019-11-21        D 2019-11-26        J
#> 15 2019-11-23        E 2019-11-24        X
#> 16 2019-11-23        E 2019-11-25        J
#> 17 2019-11-23        E 2019-11-26        J
#> 18 2019-11-25        F 2019-11-26        J

创建于 2019-11-26 由 reprex 软件包 (v0.3.0(

我找到的解决方案基于初始 agila 输入: 模糊连接后 DPLYR 管道操作员可以完成剩下的工作:

x <- data.frame(lhs_date = seq(Sys.Date() - 5, Sys.Date(), 2), letter = c("A","B","C","D","E","F"))
y <- data.frame(rhs_date = seq(Sys.Date() - 5, Sys.Date(), 1), letter = c("X","Y","Y","X","J","J"))
z= fuzzy_left_join(
x = x, 
y = y, 
by = c("lhs_date" = "rhs_date"),
match_fun = list(`<`)
)
z %>%
group_by(lhs_date) %>%
mutate(flag = row_number()) %>%
filter(flag ==1)

我可以轻松地在SQL中重现,但是在@Agila R上挣扎。虽然不完整,但你的回答指出了正确的方向,而且走得很远

由于 'dplyr' 现在已经在变异连接中获得了joined_by论据,我们现在可以用dplyr

x = cbind(data.frame(lhs_date = seq(Sys.Date()-5, Sys.Date(),2)), letter=c("A","B","C","D","E","F") )
Y = cbind(data.frame(rhs_date = seq(Sys.Date()-5, Sys.Date(),1)), letter=c("X","Y","Y","X","J","J") )
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
x |>
left_join(Y, by = join_by(closest(lhs_date < rhs_date)))
#>     lhs_date letter.x   rhs_date letter.y
#> 1 2023-03-24        A 2023-03-25        Y
#> 2 2023-03-26        B 2023-03-27        X
#> 3 2023-03-28        C 2023-03-29        J
#> 4 2023-03-24        D 2023-03-25        Y
#> 5 2023-03-26        E 2023-03-27        X
#> 6 2023-03-28        F 2023-03-29        J

创建于 2023-03-29 与 reprex v2.0.2

最新更新