r语言 - 将多个 data.table::merge 操作与 data.tables 链接在一起



是否可以将多个合并操作与data.tables一个接一个地链接在一起?

该功能类似于在dplyr管道中连接多个 data.frames,但将以类似于合并下面的两个data.tables然后根据需要操作data.table类似的链式方式用于data.tables。但只有您才能合并另一个data.table.我承认这里的这个SO问题可能非常相似,那是在@chinsoon12发表评论之后。

感谢您的任何帮助!

library(dplyr)
library(data.table)
# data.frame
df1 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
quantity = c(1:4))
df2 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
status = c("good", "bad", "rotten", "raw"))
df3 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
rank = c("okay", "good", "better", "best"))
df4 = left_join(df1,
df2,
by = "food") %>% 
mutate(new_col = NA) %>%  # this is just to hold a position of mutation in the data.frame
left_join(.,
df3,
by = "food")

# data.table
dt1 = data.table(food = c("apples", "bananas", "carrots", "dates"),
quantity = c(1:4))
dt2 = data.table(food = c("apples", "bananas", "carrots", "dates"),
status = c("good", "bad", "rotten", "raw"))
dt3 = data.table(food = c("apples", "bananas", "carrots", "dates"),
rank = c("okay", "good", "better", "best"))
# this is what I am not sure how to implement
dt4 = merge(dt1,
dt2,
by = "food")[
food == "apples"](merge(dt4))

可以将多个 data.table 连接与on参数链接在一起。请注意,在 j 中如果没有更新运算符 (":="(,这将是右连接,但使用 ":="(即添加列(,这将成为左外连接。左边的一篇有用的帖子在这里连接 使用 data.table 左连接。

使用上述示例数据的示例,在连接之间有一个子集:

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
food == "apples"][dt3, on="food", rank := i.rank]
##> dt4
## food quantity status rank
##1: apples        1   good okay

示例在联接之间添加新列

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
, new_col := NA][dt3, on="food", rank := i.rank]
##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best

使用merge和 magrittr 管道的示例:

dt4 <-  merge(dt1, dt2, by = "food") %>%
set( , "new_col", NA) %>% 
merge(dt3, by = "food")
##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best
See no other way than this (unfortunately). You need to define vectors with column names and then You may chain joining by reference like this: 
cols_dt1 <- colnames(dt_dt1)[!colnames(dt_dt1) %in% 'join_column1']
cols_dt2 <- colnames(dt_dt2)[!colnames(dt_dt2) %in% ' join_column2']
cols_dt3 <- colnames(dt_dt3)[!colnames(dt_dt3) %in% ' join_column3']
cols_dt4 <- colnames(dt_dt4)[!colnames(dt_dt4) %in% ' join_column4']
cols_dt5 <- colnames(dt_dt5)[!colnames(dt_dt5) %in% ' join_column5']
data_dt[dt_dt1, on=.( join_column1), (cols_dt1) := mget(cols_dt1)][
dt_dt2, on=.( join_column2), (cols_dt2) := mget(cols_dt2)][
dt_dt3, on=.( join_column3), (cols_dt3) := mget(cols_dt3)][
dt_dt4, on=.( join_column4), (cols_dt4) := mget(cols_dt4)][
dt_dt5, on=.( join_column5), (cols_dt5) := mget(cols_dt5)]

最新更新