r语言 - 合并两个 data.tables,其中 dt2 中的所有行都与 dt1 中的每一行合并



我有与此类似的数据,除了dt12900 万行,dt2只有 1500 万行(不是 1500 万行(。

dt1 <- data.table(ID=1:4,City=c("Charlotte","DC","Salem","Boston"))
dt2 <- data.table(Birds=c("Saker","Peregrine","Barbary","Prarie","Golden","Coopers","Canary","Finch"),BirdType=c("Falcon","Falcon","Falcon","Falcon","Eagle","Hawk","Breakfast","Breakfast"))

输出如下:

> dt1
   ID      City
1:  1 Charlotte
2:  2        DC
3:  3     Salem
4:  4    Boston
> dt2
       Birds  BirdType
1:     Saker    Falcon
2: Peregrine    Falcon
3:   Barbary    Falcon
4:    Prarie    Falcon
5:    Golden     Eagle
6:   Coopers      Hawk
7:    Canary Breakfast
8:     Finch Breakfast

我想合并两个 data.tables,其中 dt1 的每一行都与 dt2 的所有行组合在一起,最终给出一个包含 32 行的 data.table,输出如下:

> dtMerged
   ID      City  Birds     BirdType
1:  1  Charlotte Saker      Falcon
2:  1  Charlotte Peregrine  Falcon
3:  1  Charlotte Barbary    Falcon
4:  1  Charlotte Prarie     Falcon
5:  1  Charlotte Golden     Eagle   
6:  1  Charlotte Coopers    Hawk
7:  1  Charlotte Canary   Breakfast
8:  1  Charlotte Finch    Breakfast
9:   2        DC Saker      Falcon
10:  2        DC Peregrine  Falcon
11:  2        DC Barbary    Falcon
12:  2        DC Prarie     Falcon
13:  2        DC Golden     Eagle   
14:  2        DC Coopers    Hawk
15:  2        DC Canary   Breakfast
16:  2        DC Finch    Breakfast
17:  3     Salem Saker      Falcon
18:  3     Salem Saker      Falcon
etc...

关于如何最好地实现这一目标的任何想法将不胜感激。我在 Windows 7 PC 上使用data.table版本 1.10.4。 谢谢。

正如@akrun评论的那样,交叉连接似乎是解决问题的方法之一。为了实现它,我通过@jangorecki CJ.dt这篇 Stack Overflow 帖子引用了一个非常简洁的函数,以获得所需的解决方案:

CJ.dt = function(X,Y) {
  stopifnot(is.data.table(X),is.data.table(Y))
  k = NULL
  X = X[, c(k=1, .SD)]
  setkey(X, k)
  Y = Y[, c(k=1, .SD)]
  setkey(Y, NULL)
  X[Y, allow.cartesian=TRUE][, k := NULL][]
}
new_df <- CJ.dt(dt1, dt2)
setorder(new_df, ID)

以下是重新排序后的完整输出的外观:

> new_df
 ID      City     Birds  BirdType
 1:  1 Charlotte     Saker    Falcon
 2:  1 Charlotte Peregrine    Falcon
 3:  1 Charlotte   Barbary    Falcon
 4:  1 Charlotte    Prarie    Falcon
 5:  1 Charlotte    Golden     Eagle
 6:  1 Charlotte   Coopers      Hawk
 7:  1 Charlotte    Canary Breakfast
 8:  1 Charlotte     Finch Breakfast
 9:  2        DC     Saker    Falcon
10:  2        DC Peregrine    Falcon
11:  2        DC   Barbary    Falcon
12:  2        DC    Prarie    Falcon
13:  2        DC    Golden     Eagle
14:  2        DC   Coopers      Hawk
15:  2        DC    Canary Breakfast
16:  2        DC     Finch Breakfast
17:  3     Salem     Saker    Falcon
18:  3     Salem Peregrine    Falcon
19:  3     Salem   Barbary    Falcon
20:  3     Salem    Prarie    Falcon
21:  3     Salem    Golden     Eagle
22:  3     Salem   Coopers      Hawk
23:  3     Salem    Canary Breakfast
24:  3     Salem     Finch Breakfast
25:  4    Boston     Saker    Falcon
26:  4    Boston Peregrine    Falcon
27:  4    Boston   Barbary    Falcon
28:  4    Boston    Prarie    Falcon
29:  4    Boston    Golden     Eagle
30:  4    Boston   Coopers      Hawk
31:  4    Boston    Canary Breakfast
32:  4    Boston     Finch Breakfast

相关内容

  • 没有找到相关文章

最新更新