匹配r中从到不同数据帧的日期范围



我现在正在努力寻找一种方法来匹配R中的两个不同的表(行数不平衡)与日期的匹配范围。

第一个数据帧表示某人声明了什么,第二个数据帧表示实际是什么。问题是,假设A可能在他/她的仓库中保存了一周或更长时间的以公斤为单位的产品,然后在一周后出现。

所以我有两行不同的数据帧。这两个数字都超过500.000行。第一个有50万,第二个有52万。(所以有些NAs可能会发生,但这无关紧要)。

我想从这两个不同的数据帧中搜索和匹配:

  1. 的id是相同的,
  2. 产品代码和man代码要一致,而
  3. 检查第二个数据框中的日期是否在第一个数据框日期的一个月内(例如,他/她在2020/1/1声明重量(100kg),并在2020/1/15显示,但实际重量为105)。

,如果所有这些条件都为真,则将两个权重之和(来自数据帧1和2)

期望的输出必须是一个数据帧,具有匹配的id,日期范围,产品,人,2列权重和最后一列将是权重之和的值。

示例数据帧是下面R中的两个数据帧:


id = rep("A",10)
date = seq(as.Date("2000/1/1"), by = "day", length.out = 10)
product = c("E1","E1","E2","E3","E3","E4","E1","E1","E5","E5")
man = c("PS","PS","PS","DDR","DDR","FFR","PS","PS","BA","BA")
weight = c(1505,300,259,231,140,150,300,112,203,2200) 
data1 = data.frame(id,date,product,man,weight);data1

id2 = rep("A",10)
date2 = sample(seq(as.Date("2000/1/10"), by = "day", length.out = 10),10)
product2 = c("E3","E3","E1","E1","E1","E1","E5","E4","E5","E2")
man2 = c("DDR","DDR","PS","PS","PS","PS","BA","FFR","BA","PS")
weight2 = c(259,295,260,240,135,145,250,110,2000,210) 
data2 = data.frame(id2,date2,product2,man2,weight2);data2

如何将它们与初始日期延迟条件匹配?

有什么帮助吗?

不确定这是不是你想要的,但它看起来像一个滚动连接…

使用数据。表和润滑包,一个版本可以是:

(这里我展示了两个例子,一个是30天远期,一个是一个月远期)

set.seed(0)
id = rep("A",10)
date = seq(as.Date("2000/1/1"), by = "day", length.out = 10)
product = c("E1","E1","E2","E3","E3","E4","E1","E1","E5","E5")
man = c("PS","PS","PS","DDR","DDR","FFR","PS","PS","BA","BA")
weight = c(1505,300,259,231,140,150,300,112,203,2200) 
data1 = data.frame(id,date,product,man,weight);data1

id2 = rep("A",10)
date2 = sample(seq(as.Date("2000/1/10"), by = "day", length.out = 10),10)
product2 = c("E3","E3","E1","E1","E1","E1","E5","E4","E5","E2")
man2 = c("DDR","DDR","PS","PS","PS","PS","BA","FFR","BA","PS")
weight2 = c(259,295,260,240,135,145,250,110,2000,210) 
data2 = data.frame(id2,date2,product2,man2,weight2);data2
require(data.table)
require(lubridate)
setDT(data1)
setkey(data1)
setDT(data2)
setkey(data2)
data1[, weight_p50 := weight + 50]
data1[, weight_m50 := weight - 50]

data1[, date := as.Date(date)]
data2[, date2 := as.Date(date2)]
data1[, date_plus_one_month := as.Date(ymd(date %m+% months(1)))]
data1[, date_plus_30_days := as.Date(date + days(30))]
data2[, date2_ := date2]
data1[, date_ := date]
res <-
data1[data2, list(id, product, man, date = date_, date2, weight, weight2, total_weight = weight + weight2),
on = .(id = id2,
product = product2,
man = man2, 
date_plus_30_days >= date2, 
date <= date2_,
weight_p50 >= weight2,
weight_m50 <= weight2),
allow.cartesian = TRUE, nomatch = 0
][][]
setkey(res, id, product, man, date, date2)
res <- unique(res, by = c("id", "product", "man"))
res[]

给了:

id product man       date      date2 weight weight2 total_weight
1:  A      E1  PS 2000-01-02 2000-01-16    300     260          560
2:  A      E2  PS 2000-01-03 2000-01-17    259     210          469
3:  A      E3 DDR 2000-01-04 2000-01-18    231     259          490
4:  A      E4 FFR 2000-01-06 2000-01-19    150     110          260
5:  A      E5  BA 2000-01-09 2000-01-12    203     250          453

编辑:在OP评论之后,我添加了id x man x product的唯一性,这样它只保留满足约束的第一次约会。

相关内容

  • 没有找到相关文章

最新更新