r-如何考虑另一个数据库的值来对数据库进行子集划分

我有两个数据库，需要从这两个数据库编译信息。

假设第一个(Db1(是这样的：

Col1    Col2    Col3  
P1      2000    Type1    
P1      2000    Type2
P1      2001    Type2
P2      2000    Type1
P2      2001    Type1
P3      2003    Type3

而第二个(Db2(类似于(除了Col3之外的simmilar值只得到type4值(：

Col1    Col2    Col3  
P1      2000    Type4    
P1      2000    Type4
P1      2001    Type4
P2      2000    Type4
P2      2001    Type4
P3      2003    Type4

我想通过Type1、2和3创建新的数据库，但通过Col1和Col2加入Type4。首先，我只需要用Col3将Db1子集化，就可以得到类型1、2或3。

然后，我想转到Db2，以获得具有与Db1中的Type1相同的Col1和Col2值的所有行。因此，我只想要P1-2000、P2-2000和P2-2001的组合的Type4值(因此，由Type1过滤(；但我怎么能把它子集化呢？

预期输出(类型1(：

Col1    Col2    Col3  
P1      2000    Type1    
P2      2000    Type1
P2      2001    Type1
P1      2000    Type4    
P1      2000    Type4
P2      2000    Type4
P2      2001    Type4

仅使用R

lines =
'Col1    Col2    Col3  
P1      2000    Type1    
P1      2000    Type2
P1      2001    Type2
P2      2000    Type1
P2      2001    Type1
P3      2003    Type3'
Db1 = read.table(textConnection(lines), header = T)

lines =
'Col1    Col2    Col3  
P1      2000    Type4    
P1      2000    Type4
P1      2001    Type4
P2      2000    Type4
P2      2001    Type4
P3      2003    Type4'
Db2 = read.table(textConnection(lines), header = T)

#Filtering data Db1
Db1_new = Db1[Db1$Col3=='Type1', ]
#Filtering data Db2
Db1_f = Db1_new[!duplicated(Db1_new[,-3]), ] 
Db2_new = data.frame(Col1=NULL, Col2=NULL,  Col3=NULL)
for (i in 1:nrow(Db1_f)) {
aux = Db2[Db2$Col1 == Db1_f$Col1[i] & Db2$Col2 == Db1_f$Col2[i], ]
Db2_new = rbind(Db2_new, aux)
}

#Db1 merge with Db2
rbind(Db1_new, Db2_new)
#   Col1 Col2  Col3
#1    P1 2000 Type1
#4    P2 2000 Type1
#5    P2 2001 Type1
#11   P1 2000 Type4
#2    P1 2000 Type4
#41   P2 2000 Type4
#51   P2 2001 Type4

相关内容

最新更新

热门标签：