有没有两个数据集可以在没有r中一个公共变量的情况下合并



我有2个数据帧;

#df1 = sales_date
product.name <- c("dap","npk","urea","npk","dap","npk")
date.of.sale <- c("2020-07-03","2020-07-15","2020-07-09","2020-07-03","2020-07-20","2020-07-13")
sales_date <- data.frame(product.name,date.of.sale)
#df2 = week_names
week.name <-  c("21A01","21A02","21A02","21A04")
start.date <- c("2020-07-03","2020-07-10","2020-07-17","2020-07-24")
end.date <-  c("2020-07-09","2020-07-16","2020-07-23","2020-07-30")
week_names <- data.frame(week.name,start.date,end.date)

问题:我想将week.name添加到第一个数据集(df1(,条件是date.of.sale介于start.date和end.date之间。

我尝试了ifelse(手动分配start.date和end.date(,但由于这些都是大数据集,所以很头疼。任何简单方法的建议都将不胜感激。

您可以将非等联接与data.tables一起使用,它在大数据集上非常高效:

library(data.table)
# Convert data.frame to data.table
setDT(sales_date)
setDT(week_names)
# Non-equi joins only possible on numerical values : convert character to Date
week_names[,c('start.date','end.date'):=.(as.Date(start.date),as.Date(end.date))]
sales_date[,date.of.sale:=as.Date(date.of.sale)]
# Non-equi join
week_names[sales_date,.(product.name,date.of.sale=i.date.of.sale,week.name),on=.(start.date <= date.of.sale, end.date > date.of.sale)]
#>    product.name date.of.sale week.name
#> 1:          dap   2020-07-03     21A01
#> 2:          npk   2020-07-15     21A02
#> 3:         urea   2020-07-09      <NA>
#> 4:          npk   2020-07-03     21A01
#> 5:          dap   2020-07-20     21A02
#> 6:          npk   2020-07-13     21A02

相关内容

最新更新