R:合并2个数据帧,对前一行和过去的行进行索引



假设我有一个这样的数据集:

origin=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03")),
                  L=c(1,2,3),
                  Type=c("H","L","H"))
        Date L Type
1 2016-08-05 1    H
2 2016-08-04 2    L
3 2016-08-03 3    H
end=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03","2016-08-02","2016-08-01")),
               N=c(50,40,30,20,10),
               Name=c("CA","CB","CC","CD","CE"),
               Vol=c(2,1,2,2,3),
               Act=c(0.1,0.2,0.3,0.2,0.2))
        Date  N Name Vol Act
1 2016-08-05 50   CA   2 0.1
2 2016-08-04 40   CB   1 0.2
3 2016-08-03 30   CC   2 0.3
4 2016-08-02 20   CD   2 0.2
5 2016-08-01 10   CE   3 0.2

我想要这样的东西:

        Date L Type  N Name Vol Act
3 2016-08-05 1    H 50   CA   2 0.1
3 2016-08-05 1    H 40   CB   1 0.2
3 2016-08-05 1    H 30   CC   2 0.3
2 2016-08-04 2    L 40   CB   1 0.2
2 2016-08-04 2    L 30   CC   2 0.3
2 2016-08-04 2    L 20   CD   2 0.2
1 2016-08-03 3    H 30   CC   2 0.3
1 2016-08-03 3    H 20   CD   2 0.2
1 2016-08-03 3    H 10   CE   3 0.2

我想保持"origin"的原始列日期,并且在合并中,我想将其与"end"的当前和以前的日期值(前面的2个值)合并,就像一种合并循环。在其他帖子中,只匹配公共值,这给出了3行结果:

merge(x = origin, y = end, by = "Date")
        Date L Type  N Name Vol Act
1 2016-08-03 3    H 30   CC   2 0.3
2 2016-08-04 2    L 40   CB   1 0.2
3 2016-08-05 1    H 50   CA   2 0.1

这是非常不同的,并没有合并当前和以前的行值的两个数据帧,所以我无法弄清楚如何去这个

看起来data.table中的foverlaps很适合这个工作:

# prepare data and add extra columns for foverlaps join which relies on columns instead of one
library(data.table)
setDT(origin)[, DateStart := Date - 2]
setDT(end)[, DateStart := Date]
setkey(origin, DateStart, Date)
# join two tables with foverlaps and remove subsidiary columns
foverlaps(end, origin, type = "within")[, `:=` (DateStart = NULL, i.Date = NULL, i.DateStart = NULL)][order(Date)]
#          Date L Type  N Name Vol Act
# 1: 2016-08-03 3    H 30   CC   2 0.3
# 2: 2016-08-03 3    H 20   CD   2 0.2
# 3: 2016-08-03 3    H 10   CE   3 0.2
# 4: 2016-08-04 2    L 40   CB   1 0.2
# 5: 2016-08-04 2    L 30   CC   2 0.3
# 6: 2016-08-04 2    L 20   CD   2 0.2
# 7: 2016-08-05 1    H 50   CA   2 0.1
# 8: 2016-08-05 1    H 40   CB   1 0.2
# 9: 2016-08-05 1    H 30   CC   2 0.3

或使用data.table版本1.9.7non-equi连接特性:

setDT(origin)[, `:=` (DateEnd = Date, StartDate = Date - 2)]
             [setDT(end), on = .(DateEnd >= Date, StartDate <= Date), allow = T]
#          Date L Type    DateEnd  StartDate  N Name Vol Act
# 1: 2016-08-05 1    H 2016-08-05 2016-08-05 50   CA   2 0.1
# 2: 2016-08-04 2    L 2016-08-04 2016-08-04 40   CB   1 0.2
# 3: 2016-08-05 1    H 2016-08-04 2016-08-04 40   CB   1 0.2
# 4: 2016-08-03 3    H 2016-08-03 2016-08-03 30   CC   2 0.3
# 5: 2016-08-04 2    L 2016-08-03 2016-08-03 30   CC   2 0.3
# 6: 2016-08-05 1    H 2016-08-03 2016-08-03 30   CC   2 0.3
# 7: 2016-08-03 3    H 2016-08-02 2016-08-02 20   CD   2 0.2
# 8: 2016-08-04 2    L 2016-08-02 2016-08-02 20   CD   2 0.2
# 9: 2016-08-03 3    H 2016-08-01 2016-08-01 10   CE   3 0.2

删除子列应该很简单

最新更新