我有一些数据,我想在R 中使用dplyr获得不同IDNum的相同数据(公司名称、地址、城市、州和邮政编码(
Company Name | Address | City | State | Zip | IDNum
Kiah Auto | 101 Smith Ave | Smith | AZ | 87788 | 1001
Kiah Auto | 101 Smith Ave | Smith | AZ | 87788 | 1002
ABC Auto | 89 Broadway Ave | Broadway | NY | 10112 | 9001
ABC Auto | 89 Broadway Ave | Broadway | NY | 10112 | 9001
XYZ Auto | 3A West 13th Street | San | CA | 90111 | 2321
XYZ Auto | 3A West 13th Street | San | CA | 90111 | 2001
下面是我想要实现的表格。
Company Name | Address | City | State | Zip | ID
Kiah Auto | 101 Smith Ave | Smith | AZ | 87788 | 1001
Kiah Auto | 101 Smith Ave | Smith | AZ | 87788 | 1002
XYZ Auto | 3A West 13th Street | San | CA | 90111 | 2321
XYZ Auto | 3A West 13th Street | San | CA | 90111 | 2001
提前感谢
选择具有1个以上IDNum
值唯一值的组。
library(dplyr)
df %>%
group_by(Company.Name, Address, City, State, Zip) %>%
filter(n_distinct(IDNum) > 1) %>%
ungroup -> result
result
# Company.Name Address City State Zip IDNum
# <chr> <chr> <chr> <chr> <int> <int>
#1 Kiah Auto 101 Smith Ave Smith AZ 87788 1001
#2 Kiah Auto 101 Smith Ave Smith AZ 87788 1002
#3 XYZ Auto 3A West 13th Street San CA 90111 2321
#4 XYZ Auto 3A West 13th Street San CA 90111 2001
同样在碱基R和CCD_ 2中。
result <- subset(df, ave(IDNum, Company.Name, Address, City, State, Zip,
FUN = function(x) length(unique(x))) > 1)
#data.table
library(data.table)
setDT(df)[, .SD[uniqueN(IDNum) > 1], .(Company.Name, Address, City, State, Zip)]
数据
如果使用dput
以可复制的格式提供数据,则会更容易提供帮助。
df <- structure(list(Company.Name = c("Kiah Auto", "Kiah Auto", "ABC Auto",
"ABC Auto", "XYZ Auto", "XYZ Auto"), Address = c("101 Smith Ave",
"101 Smith Ave", "89 Broadway Ave", "89 Broadway Ave", "3A West 13th Street",
"3A West 13th Street"), City = c("Smith", "Smith", "Broadway",
"Broadway", "San", "San"), State = c("AZ", "AZ", "NY", "NY",
"CA", "CA"), Zip = c(87788L, 87788L, 10112L, 10112L, 90111L,
90111L), IDNum = c(1001L, 1002L, 9001L, 9001L, 2321L, 2001L)),
class = "data.frame", row.names = c(NA, -6L))