我正在尝试使用data.table索引来执行快速查找。
table = c("AX-11415458", "AX-11417054", "AX-11419082", "AX-11421703",
"AX-11422856", "AX-11422870")
df1 = structure(list(V1 = c(26L, 26L, 26L, 26L, 26L, 26L), V2 = c("AX-11415458",
"AX-11417054", "AX-11419082", "AX-11421703", "AX-11422856", "AX-11422870"
), V3 = c(0L, 0L, 0L, 0L, 0L, 0L), V4 = c(705L, 3973L, 2859L,
1683L, 6482L, 11930L), V5 = c("C", "G", "C", "A", "C", "G"),
V6 = c("A", "A", "T", "G", "T", "T")), row.names = c(NA,
-6L), class = "data.frame")
df2=structure(list(V1 = c("MT", "MT", "MT", "MT", "MT", "MT"), V2 = c("AX-11415458",
"AX-11417054", "AX-11419082", "AX-11421703", "AX-11422856", "AX-11422870"
), V3 = c(0L, 0L, 0L, 0L, 0L, 0L), V4 = c(705L, 3973L, 2859L,
1683L, 6482L, 11930L), V5 = c(".", ".", ".", ".", ".", "."),
V6 = c("A", "A", "T", "G", "T", "T")), row.names = c(NA,
-6L), class = "data.frame")
setDT(df1)
setDT(df2)
setkey(df1, V2)
setkey(df2, V2)
我想遍历表,查找df1和df2中的值,并用df1中的V5和V6替换df2中。
for (i in table) {
df2[.(i), nomatch = 0L][,5:6] = df1[.(i), nomatch = 0L][,5:6]
}
但我得到了错误:
[<-.data.table
(*tmp*
,.(i(,nomatch=0L,value=list(V1=";MT"未使用的参数(nomatch=0(
为什么我不能这样做,有正确的方法来做我想做的事情吗?
事实上,您的可以直接更正为
for (i in table) {
df2[i,5:6] <- df1[i,5:6]
}
nomatch = 0L
仅用于内部联接,链[,5:6]
不会更新原始df
中的数据。
此外,你也可以尝试这种方法
setDT(df1)
setDT(df2)
df3 <- df1[V2 %chin% table]
setkey(df2,V2)
setkey(df3,V2)
df2[,`:=`(
V5=fcoalesce(df3[df2,V5]),
V6=fcoalesce(df3[df2,V6])
)
]
结果
> df2
V1 V2 V3 V4 V5 V6
1: MT AX-11415458 0 705 C A
2: MT AX-11417054 0 3973 G A
3: MT AX-11419082 0 2859 C T
4: MT AX-11421703 0 1683 A G
5: MT AX-11422856 0 6482 C T
6: MT AX-11422870 0 11930 G T