我有两个数据表 x 和 y,如下所示:
x
Sample code code_conn pool_content
Sample1 A1 X1 ASD
Sample1 A1 X3 POS
Sample1 B1 Y1 QAW
Sample1 B1 Y2 QWP
Sample1 B1 Y3 SDE
Sample1 B1 Y4 LKS
Sample1 C1 Z2 JMP
Sample1 C1 Z3 MSK
Sample2 A1 X1 ASD
Sample2 A1 X2 POS
Sample2 B1 Y1 QAW
Sample2 B1 Y2 QWP
Sample2 B1 Y3 LKS
Sample2 B1 Y4 ASE
Sample2 B1 Y5 ASM
Sample2 C1 Z1 MSK
Sample2 D1 P1 EAX
Sample2 D1 P2 EBX
Sample2 D1 P3 ECX
Sample2 D1 P4 EDX
Sample2 D1 P5 MOV
Sample2 D1 P6 ALL
y
code code_conn
A1 X1
A1 X2
A1 X3
B1 Y1
B1 Y2
B1 Y3
B1 Y4
B1 Y5
C1 Z1
C1 Z2
C1 Z3
C1 Z4
D1 P1
D1 P2
D1 P3
D1 P4
D1 P5
D1 P6
我希望输出如下:
Sample code code_conn pool_content
Sample1 A1 X1 ASD
Sample1 A1 X2 …
Sample1 A1 X3 POS
Sample1 B1 Y1 QAW
Sample1 B1 Y2 QWP
Sample1 B1 Y3 SDE
Sample1 B1 Y4 LKS
Sample1 B1 Y5 …
Sample1 C1 Z2 JMP
Sample1 C1 Z3 MSK
Sample1 C1 Z1 …
Sample1 C1 Z4 …
Sample2 A1 X1 ASD
Sample2 A1 X2 POS
Sample2 A1 X3 …
Sample2 B1 Y1 QAW
Sample2 B1 Y2 QWP
Sample2 B1 Y3 LKS
Sample2 B1 Y4 ASE
Sample2 B1 Y5 ASM
Sample2 C1 Z1 MSK
Sample2 C1 Z2 …
Sample2 C1 Z3 …
Sample2 C1 Z4 …
Sample2 D1 P1 EAX
Sample2 D1 P2 EBX
Sample2 D1 P3 ECX
Sample2 D1 P4 EDX
Sample2 D1 P5 MOV
Sample2 D1 P6 ALL
所以,我基本上想要的是,对于 x 中的每个唯一示例,对于每组代码,我希望表 x 中表 y 中的所有缺失code_conn及其pool_content都设置为"..."。有没有一种内存高效和快速的方法可以做到这一点。代码和code_conn可以是任何字符串(不是字母和数字的串行组合(。我附上了代码以生成下面的数据。谢谢。
x <- data.table("Sample" = c(rep("Sample1", 8), rep("Sample2", 14)),
"code" = c("A1", "A1", "B1", "B1", "B1", "B1", "C1", "C1", "A1", "A1", rep("B1", 5), "C1", "C1", rep("D1", 5)),
"code_conn" = c("X1","X3","Y1","Y2","Y3","Y4","Z2","Z3","X1","X2","Y1","Y2","Y3","Y4","Y5","Z1","P1","P2","P3","P4","P5","P6"),
"pool_content" = c("ASD","POS","QAW","QWP","SDE","LKS","JMP","MSK","ASD","POS","QAW","QWP","LKS","ASE","ASM","MSK","EAX","EBX","ECX","EDX","MOV","ALL"))
y <- data.table("code" = c(rep("A1", 3), rep("B1", 5), rep("C1", 4), rep("D1", 6)),
"code_conn" = c("X1","X2","X3","Y1","Y2","Y3","Y4","Y5","Z1","Z2","Z3","Z4","P1","P2","P3","P4","P5","P6"))
也许你可以尝试以下基本R解决方案(使用merge()
(
z <- merge(x,cbind(Sample = rep(unique(x$Sample),each = nrow(y)),y),all = T)
这样
> z
Sample code code_conn pool_content
1 Sample1 A1 X1 ASD
2 Sample1 A1 X2 <NA>
3 Sample1 A1 X3 POS
4 Sample1 B1 Y1 QAW
5 Sample1 B1 Y2 QWP
6 Sample1 B1 Y3 SDE
7 Sample1 B1 Y4 LKS
8 Sample1 B1 Y5 <NA>
9 Sample1 C1 Z1 <NA>
10 Sample1 C1 Z2 JMP
11 Sample1 C1 Z3 MSK
12 Sample1 C1 Z4 <NA>
13 Sample1 D1 P1 <NA>
14 Sample1 D1 P2 <NA>
15 Sample1 D1 P3 <NA>
16 Sample1 D1 P4 <NA>
17 Sample1 D1 P5 <NA>
18 Sample1 D1 P6 <NA>
19 Sample2 A1 X1 ASD
20 Sample2 A1 X2 POS
21 Sample2 A1 X3 <NA>
22 Sample2 B1 Y1 QAW
23 Sample2 B1 Y2 QWP
24 Sample2 B1 Y3 LKS
25 Sample2 B1 Y4 ASE
26 Sample2 B1 Y5 ASM
27 Sample2 C1 Z1 MSK
28 Sample2 C1 Z2 <NA>
29 Sample2 C1 Z3 <NA>
30 Sample2 C1 Z4 <NA>
31 Sample2 D1 P1 EAX
32 Sample2 D1 P2 EBX
33 Sample2 D1 P3 ECX
34 Sample2 D1 P4 EDX
35 Sample2 D1 P5 MOV
36 Sample2 D1 P6 ALL
我认为 dplyr::left_join 通过示例和代码可以帮助您。它将为不存在的值提供 NULL,并将其保持行数与您拥有的行数相同。