我有与此类似的数据,除了dt1
有 2900 万行,dt2
只有 1500 万行(不是 1500 万行(。
dt1 <- data.table(ID=1:4,City=c("Charlotte","DC","Salem","Boston"))
dt2 <- data.table(Birds=c("Saker","Peregrine","Barbary","Prarie","Golden","Coopers","Canary","Finch"),BirdType=c("Falcon","Falcon","Falcon","Falcon","Eagle","Hawk","Breakfast","Breakfast"))
输出如下:
> dt1
ID City
1: 1 Charlotte
2: 2 DC
3: 3 Salem
4: 4 Boston
> dt2
Birds BirdType
1: Saker Falcon
2: Peregrine Falcon
3: Barbary Falcon
4: Prarie Falcon
5: Golden Eagle
6: Coopers Hawk
7: Canary Breakfast
8: Finch Breakfast
我想合并两个 data.tables,其中 dt1 的每一行都与 dt2 的所有行组合在一起,最终给出一个包含 32 行的 data.table,输出如下:
> dtMerged
ID City Birds BirdType
1: 1 Charlotte Saker Falcon
2: 1 Charlotte Peregrine Falcon
3: 1 Charlotte Barbary Falcon
4: 1 Charlotte Prarie Falcon
5: 1 Charlotte Golden Eagle
6: 1 Charlotte Coopers Hawk
7: 1 Charlotte Canary Breakfast
8: 1 Charlotte Finch Breakfast
9: 2 DC Saker Falcon
10: 2 DC Peregrine Falcon
11: 2 DC Barbary Falcon
12: 2 DC Prarie Falcon
13: 2 DC Golden Eagle
14: 2 DC Coopers Hawk
15: 2 DC Canary Breakfast
16: 2 DC Finch Breakfast
17: 3 Salem Saker Falcon
18: 3 Salem Saker Falcon
etc...
关于如何最好地实现这一目标的任何想法将不胜感激。我在 Windows 7 PC 上使用data.table
版本 1.10.4。 谢谢。
正如@akrun评论的那样,交叉连接似乎是解决问题的方法之一。为了实现它,我通过@jangorecki CJ.dt
这篇 Stack Overflow 帖子引用了一个非常简洁的函数,以获得所需的解决方案:
CJ.dt = function(X,Y) {
stopifnot(is.data.table(X),is.data.table(Y))
k = NULL
X = X[, c(k=1, .SD)]
setkey(X, k)
Y = Y[, c(k=1, .SD)]
setkey(Y, NULL)
X[Y, allow.cartesian=TRUE][, k := NULL][]
}
new_df <- CJ.dt(dt1, dt2)
setorder(new_df, ID)
以下是重新排序后的完整输出的外观:
> new_df
ID City Birds BirdType
1: 1 Charlotte Saker Falcon
2: 1 Charlotte Peregrine Falcon
3: 1 Charlotte Barbary Falcon
4: 1 Charlotte Prarie Falcon
5: 1 Charlotte Golden Eagle
6: 1 Charlotte Coopers Hawk
7: 1 Charlotte Canary Breakfast
8: 1 Charlotte Finch Breakfast
9: 2 DC Saker Falcon
10: 2 DC Peregrine Falcon
11: 2 DC Barbary Falcon
12: 2 DC Prarie Falcon
13: 2 DC Golden Eagle
14: 2 DC Coopers Hawk
15: 2 DC Canary Breakfast
16: 2 DC Finch Breakfast
17: 3 Salem Saker Falcon
18: 3 Salem Peregrine Falcon
19: 3 Salem Barbary Falcon
20: 3 Salem Prarie Falcon
21: 3 Salem Golden Eagle
22: 3 Salem Coopers Hawk
23: 3 Salem Canary Breakfast
24: 3 Salem Finch Breakfast
25: 4 Boston Saker Falcon
26: 4 Boston Peregrine Falcon
27: 4 Boston Barbary Falcon
28: 4 Boston Prarie Falcon
29: 4 Boston Golden Eagle
30: 4 Boston Coopers Hawk
31: 4 Boston Canary Breakfast
32: 4 Boston Finch Breakfast