r-如何只选择数据帧中列中重复的行



我已经将两个数据帧连接在一起,并且我试图只选择重复的"分支代码"。

我想加入数据集"BranchData"one_answers"BranchCode",以便包括两个数据集通用的任何分支代码,以及两个数据集中不通用的分支代码。

然而,下面代码的最后一行似乎不起作用!

BranchData$'Branch Code' <
as.numeric(BranchData$'Branch Code')
BranchCalls$'Branch Code' <- as.numeric(BranchCalls$'Branch Code')
BranchData <- na.omit(BranchData)
merged <- full_join(BranchData,BranchCalls)
merged <- merged %>% group_by(merged$`Branch Code`) %>% filter(n() >= 2)

此外,当我试图将重复项分组,以便所有重复项都在一起时,但以下代码似乎不起作用!:

merged <- group_by(merged,merged$'Branch Code')

最小再现性示例:

structure(list(`Branch Code` = c(401801, 436801, 403801, 164801, 
198801), `Location Type` = c("Urban", "Urban", "Urban Deprived", 
"Rural", "Urban"), Type = c("MAIN", "MAIN", "MAIN", "MAIN", "LM"
), Status = c("Open", "Open", "Open", "Open", "Open"), Segment = c("Agency", 
"Agency", "Agency", "Agency", "Agency"), `Multiple (partner that owns multiple branches)` = c("Multiple 11", 
"Multiple 11", "Multiple 12", "Multiple 13", "Multiple 13"), 
RetailType = c("Books_Stationery", "Books_Stationery", "Convenience", 
"Convenience", "Convenience"), `Volume of transactions` = c(2238, 
1514, 1346, 1338, 625), `Open hours` = c(47.75, 50.2500000000001, 
46.5, 48.25, 114.25), `X Pos` = c(394169, 393488, 394434, 
392153, 393094), `Y Pos` = c(806326, 805877, 804347, 796902, 
802789), Urbanity = c("Major Centre", "Major Centre", "High Density", 
"Low Density", "Low Density"), `Case Reference Number` = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), `Created On` = structure(c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Branch Type` = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), L1 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), L2 = c(NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), L3 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), L4 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), `Case Type` = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
), na.action = structure(c(`3` = 3L, `4` = 4L, `5` = 5L, `6` = 6L, 
`7` = 7L, `8` = 8L, `9` = 9L, `11` = 11L, `13` = 13L, `16` = 16L, 
`17` = 17L, `18` = 18L, `20` = 20L, `21` = 21L, `22` = 22L, `23` = 23L, 
`26` = 26L, `27` = 27L, `28` = 28L, `29` = 29L, `31` = 31L, `32` = 32L, 
`33` = 33L, `34` = 34L, `35` = 35L, `36` = 36L, `37` = 37L, `39` = 39L, 
`40` = 40L, `41` = 41L, `42` = 42L, `43` = 43L, `44` = 44L, `45` = 45L, 
`46` = 46L, `47` = 47L, `48` = 48L, `49` = 49L, `51` = 51L, `52` = 52L, 
`54` = 54L, `55` = 55L, `57` = 57L, `58` = 58L, `59` = 59L, `60` = 60L, 
`61` = 61L, `62` = 62L, `63` = 63L, `65` = 65L, `67` = 67L, `68` = 68L, 
`69` = 69L, `70` = 70L, `71` = 71L, `72` = 72L, `74` = 74L, `75` = 75L, 
`76` = 76L, `77` = 77L, `78` = 78L, `80` = 80L, `81` = 81L, `82` = 82L, 
`83` = 83L, `84` = 84L, `86` = 86L, `87` = 87L, `88` = 88L, `89` = 89L, 
`91` = 91L, `92` = 92L, `93` = 93L, `96` = 96L, `97` = 97L, `98` = 98L, 
`99` = 99L, `100` = 100L, `101` = 101L, `103` = 103L, `106` = 106L, 
`107` = 107L, `108` = 108L, `109` = 109L, `110` = 110L, `111` = 111L, 
`112` = 112L, `113` = 113L, `114` = 114L, `115` = 115L, `116` = 116L, 
`117` = 117L, `118` = 118L, `119` = 119L, `120` = 120L, `121` = 121L, 
`122` = 122L, `123` = 123L, `124` = 124L, `126` = 126L, `127` = 127L, 
`129` = 129L, `130` = 130L, `131` = 131L, `132` = 132L, `133` = 133L, 
`134` = 134L, `135` = 135L, `136` = 136L, `137` = 137L, `139` = 139L, 
`140` = 140L, `141` = 141L, `142` = 142L, `143` = 143L, `144` = 144L, 
`145` = 145L, `146` = 146L, `147` = 147L, `148` = 148L, `149` = 149L, 
`150` = 150L, `151` = 151L, `152` = 152L, `153` = 153L, `155` = 155L, 
`156` = 156L, `157` = 157L, `160` = 160L, `161` = 161L, `162` = 162L, 
`163` = 163L, `165` = 165L, `166` = 166L, `167` = 167L, `168` = 168L, 
`169` = 169L, `174` = 174L, `175` = 175L, `176` = 176L, `177` = 177L, 
`178` = 178L, `179` = 179L, `180` = 180L, `182` = 182L, `183` = 183L, 
`185` = 185L, `186` = 186L, `188` = 188L, `189` = 189L, `190` = 190L, 
`191` = 191L, `192` = 192L, `193` = 193L, `194` = 194L, `195` = 195L, 
`196` = 196L, `197` = 197L, `198` = 198L, `199` = 199L, `200` = 200L, 
`201` = 201L, `203` = 203L, `204` = 204L, `205` = 205L, `206` = 206L, 
`207` = 207L, `209` = 209L, `210` = 210L, `211` = 211L, `212` = 212L, 
`213` = 213L, `214` = 214L, `215` = 215L, `216` = 216L, `217` = 217L, 
`218` = 218L, `219` = 219L, `220` = 220L, `221` = 221L, `222` = 222L, 
`223` = 223L, `224` = 224L, `226` = 226L, `227` = 227L, `228` = 228L, 
`229` = 229L, `230` = 230L, `231` = 231L, `232` = 232L, `233` = 233L, 
`234` = 234L, `236` = 236L, `237` = 237L, `238` = 238L, `239` = 239L, 
`240` = 240L, `241` = 241L, `242` = 242L, `243` = 243L, `244` = 244L, 
`245` = 245L, `247` = 247L, `248` = 248L, `249` = 249L, `250` = 250L, 
`251` = 251L, `252` = 252L, `253` = 253L, `254` = 254L, `255` = 255L, 
`256` = 256L, `257` = 257L, `258` = 258L, `259` = 259L, `260` = 260L, 
`261` = 261L, `262` = 262L, `263` = 263L, `264` = 264L, `265` = 265L, 
`266` = 266L, `267` = 267L, `268` = 268L, `269` = 269L, `270` = 270L, 
`271` = 271L, `272` = 272L, `273` = 273L, `274` = 274L, `276` = 276L, 
`278` = 278L, `280` = 280L, `281` = 281L, `282` = 282L, `283` = 283L, 
`284` = 284L, `285` = 285L, `286` = 286L, `288` = 288L, `289` = 289L, 
`291` = 291L, `292` = 292L, `293` = 293L, `294` = 294L, `296` = 296L, 
`297` = 297L, `298` = 298L, `299` = 299L, `300` = 300L, `301` = 301L, 
`304` = 304L, `305` = 305L, `306` = 306L, `307` = 307L, `308` = 308L, 
`311` = 311L, `312` = 312L, `313` = 313L, `316` = 316L, `319` = 319L, 
`321` = 321L, `322` = 322L, `323` = 323L, `324` = 324L, `325` = 325L, 
`326` = 326L, `327` = 327L, `328` = 328L, `329` = 329L, `330` = 330L, 
`331` = 331L, `332` = 332L, `333` = 333L, `335` = 335L, `337` = 337L, 
`338` = 338L, `339` = 339L, `340` = 340L, `341` = 341L, `342` = 342L, 
`343` = 343L, `344` = 344L, `345` = 345L, `346` = 346L, `347` = 347L, 
`348` = 348L, `349` = 349L, `350` = 350L, `351` = 351L, `352` = 352L, 
`353` = 353L, `354` = 354L, `355` = 355L, `356` = 356L, `357` = 357L, 
`359` = 359L, `360` = 360L, `361` = 361L, `362` = 362L, `363` = 363L, 
`365` = 365L, `366` = 366L, `367` = 367L, `368` = 368L, `370` = 370L, 
`371` = 371L, `372` = 372L, `373` = 373L, `375` = 375L, `376` = 376L, 
`378` = 378L, `379` = 379L, `380` = 380L, `381` = 381L, `382` = 382L, 
`384` = 384L, `385` = 385L, `387` = 387L, `388` = 388L, `389` = 389L, 
`390` = 390L, `391` = 391L, `392` = 392L, `393` = 393L, `395` = 395L, 
`396` = 396L, `397` = 397L, `398` = 398L, `399` = 399L, `400` = 400L, 
`401` = 401L, `403` = 403L, `404` = 404L, `405` = 405L, `409` = 409L, 
`412` = 412L, `413` = 413L, `414` = 414L, `415` = 415L, `416` = 416L, 
`418` = 418L, `419` = 419L, `420` = 420L, `421` = 421L, `422` = 422L, 
`423` = 423L, `426` = 426L, `427` = 427L, `428` = 428L, `429` = 429L, 
`432` = 432L, `433` = 433L, `435` = 435L, `436` = 436L, `437` = 437L, 
`438` = 438L, `440` = 440L, `441` = 441L, `442` = 442L, `443` = 443L, 

如果有人能伸出援手,我将不胜感激!

非常感谢!

您可以使用table:

merged %>% filter(table(`Branch Code`)[`Branch Code`] > 1)

或使用add_count:

merged %>% add_count(`Branch Code`) %>% filter(n > 1)

我创建了一个小样本数据:

merged <- data.frame(branch_code = c("401801", "436801", "401801"),
location_type = c("Urban", "Urban", "Rural"))
branch_code location_type
1      401801         Urban
2      436801         Urban
3      401801         Rural

你可以使用这个代码:

merged %>%
group_by(branch_code) %>%
mutate(n = n()) %>%
filter(n > 1) %>%
select(-n)

输出:

# A tibble: 2 × 2
# Groups:   branch_code [1]
branch_code location_type     
<chr>       <chr>         
1 401801      Urban             
2 401801      Rural             

最新更新