如果三个变量匹配,则连接五个表



成本

            Name  Class     Status   Cost
      Page, Lisa     11  Full Time  54550
      Page, Lisa     10   Contract  26795
  Taylor, Hector      7  Full Time  42540
Dawson, Jonathan     11  Full Time  35680
Dawson, Jonathan      6  Full Time  72830
Dawson, Jonathan      5   Contract  60830
     Pratt, Erik      8  Full Time  83000

受试者

            Name  Class     Status  Subjects
      Page, Lisa     11  Full Time     Maths
      Page, Lisa     10   Contract   Science
  Taylor, Hector      7  Full Time   Science
Dawson, Jonathan     11  Full Time   English
Dawson, Jonathan      6  Full Time     Maths
Dawson, Jonathan      5   Contract     Maths
     Pratt, Erik      8 Full-Time   Hinduism

计算机无

            Name  Class     Status  ComputerNo
      Page, Lisa     11  Full Time      115005
      Page, Lisa     10   Contract      450005
  Taylor, Hector      7  Full Time      380025
Dawson, Jonathan     11  Full Time      152253
Dawson, Jonathan      6  Full Time      125523
Dawson, Jonathan      5   Contract      485125

许可证编号

            Name  Class     Status  LicenseNo
      Page, Lisa     11  Full Time   HJ452632
      Page, Lisa     10   Contract   HJ452634
  Taylor, Hector      7  Full Time   HJ352236
Dawson, Jonathan     11  Full Time   HJ456236
Dawson, Jonathan      6  Full Time   HJ456230
Dawson, Jonathan      5   Contract   HJ456232
     Pratt, Erik      8  Full Time   HJ130055

国家

            Name  Class     Status    Country
      Page, Lisa     11 Full-Time   Hong Kong
      Page, Lisa     10   Contract  Hong Kong
  Taylor, Hector      7 Full-Time UK
Dawson, Jonathan     11 Full-Time USA
Dawson, Jonathan      6 Full-Time USA
Dawson, Jonathan      5   Contract        USA
     Pratt, Erik      8 Full-Time Japan

我期望的结果表是这样的组合数据集

            Name  Class     Status   Cost  Subjects  ComputerNo  LicenseNo    Country
      Page, Lisa     11  Full Time  54550     Maths      115005   HJ452632  Hong Kong
      Page, Lisa     10   Contract  26795   Science      450005   HJ452634  Hong Kong
  Taylor, Hector      7  Full Time  42540   Science      380025   HJ352236         UK
Dawson, Jonathan     11  Full Time  35680   English      152253   HJ456236        USA
Dawson, Jonathan      6  Full Time  72830     Maths      125523   HJ456230        USA
Dawson, Jonathan      5   Contract  60830     Maths      485125   HJ456232        USA
     Pratt, Erik      8  Full Time  83000  Hinduism        -NA-   HJ130055      Japan

如上所述,我有五个数据表,我想通过连接来创建一个数据集。

我想在每个数据表中匹配3个变量(名称、类和状态);然后加入。如果某个特定表格中没有满足这些标准,那么我希望在最终表格中看到这两个标准。(作为空白单元格或通过"-NA-"备注)。

使用基本的R merge()函数,在by()中列出多个连接列,并指定all=TRUE以在右表和左表中保留记录:

finaldf <- merge(cost, subject, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, computerNo, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, licenseNo, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, country, by=c("Name", "Class", "Status"), all=TRUE)

您可以使用Reduce:一次完成所有操作

Reduce(function(x, y) merge(x, y, all = TRUE, 
    by = c("Name", "Class", "Status")), list(cost, subject, computerNo, licenseNo, country))