成本
Name Class Status Cost
Page, Lisa 11 Full Time 54550
Page, Lisa 10 Contract 26795
Taylor, Hector 7 Full Time 42540
Dawson, Jonathan 11 Full Time 35680
Dawson, Jonathan 6 Full Time 72830
Dawson, Jonathan 5 Contract 60830
Pratt, Erik 8 Full Time 83000
受试者
Name Class Status Subjects
Page, Lisa 11 Full Time Maths
Page, Lisa 10 Contract Science
Taylor, Hector 7 Full Time Science
Dawson, Jonathan 11 Full Time English
Dawson, Jonathan 6 Full Time Maths
Dawson, Jonathan 5 Contract Maths
Pratt, Erik 8 Full-Time Hinduism
计算机无
Name Class Status ComputerNo
Page, Lisa 11 Full Time 115005
Page, Lisa 10 Contract 450005
Taylor, Hector 7 Full Time 380025
Dawson, Jonathan 11 Full Time 152253
Dawson, Jonathan 6 Full Time 125523
Dawson, Jonathan 5 Contract 485125
许可证编号
Name Class Status LicenseNo
Page, Lisa 11 Full Time HJ452632
Page, Lisa 10 Contract HJ452634
Taylor, Hector 7 Full Time HJ352236
Dawson, Jonathan 11 Full Time HJ456236
Dawson, Jonathan 6 Full Time HJ456230
Dawson, Jonathan 5 Contract HJ456232
Pratt, Erik 8 Full Time HJ130055
国家
Name Class Status Country
Page, Lisa 11 Full-Time Hong Kong
Page, Lisa 10 Contract Hong Kong
Taylor, Hector 7 Full-Time UK
Dawson, Jonathan 11 Full-Time USA
Dawson, Jonathan 6 Full-Time USA
Dawson, Jonathan 5 Contract USA
Pratt, Erik 8 Full-Time Japan
我期望的结果表是这样的组合数据集
Name Class Status Cost Subjects ComputerNo LicenseNo Country
Page, Lisa 11 Full Time 54550 Maths 115005 HJ452632 Hong Kong
Page, Lisa 10 Contract 26795 Science 450005 HJ452634 Hong Kong
Taylor, Hector 7 Full Time 42540 Science 380025 HJ352236 UK
Dawson, Jonathan 11 Full Time 35680 English 152253 HJ456236 USA
Dawson, Jonathan 6 Full Time 72830 Maths 125523 HJ456230 USA
Dawson, Jonathan 5 Contract 60830 Maths 485125 HJ456232 USA
Pratt, Erik 8 Full Time 83000 Hinduism -NA- HJ130055 Japan
如上所述,我有五个数据表,我想通过连接来创建一个数据集。
我想在每个数据表中匹配3个变量(名称、类和状态);然后加入。如果某个特定表格中没有满足这些标准,那么我希望在最终表格中看到这两个标准。(作为空白单元格或通过"-NA-"备注)。
使用基本的R merge()函数,在by()
中列出多个连接列,并指定all=TRUE
以在右表和左表中保留记录:
finaldf <- merge(cost, subject, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, computerNo, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, licenseNo, by=c("Name", "Class", "Status"), all=TRUE)
finaldf <- merge(finaldf, country, by=c("Name", "Class", "Status"), all=TRUE)
您可以使用Reduce
:一次完成所有操作
Reduce(function(x, y) merge(x, y, all = TRUE,
by = c("Name", "Class", "Status")), list(cost, subject, computerNo, licenseNo, country))