基本上,我有两个具有这种基本结构的数据帧:
Col1 | Col2 | Col3 | >Col4 |
---|---|---|---|
aaa | 12 | >td style="text-align:centre;">xxx32b | |
tfe | 21 | >td style="text align:central;">xxx<14f>||
eed | 12 | >td style="text align:central;">xxx54b | |
wes | 95 | >td style="text align:central;">xxx54r | |
rtf | 44 | >td style="text-align:center;">xxx#99q||
fvg | 87 | >td style="text align:central;">xxx55h |
您可以使用setdiff
为每个数据帧选择Col1中具有唯一值的行。
x[x$Col1 %in% setdiff(x$Col1, y$Col1),]
#x[!x$Col1 %in% intersect(x$Col1, y$Col1),] #Alternative
# Col1 Col2 Col3 Col4
#1 aaa 12 xxx 32b
y[y$Col1 %in% setdiff(y$Col1, x$Col1),]
# Col1 Col2 Col3 Col4
#1 bbb 12 xxx 32b
数据:
x <- read.table(header=TRUE, text="Col1 Col2 Col3 Col4
aaa 12 xxx 32b
tfe 21 xxx 14f
eed 12 xxx 54b
wes 95 xxx 54r
rtf 44 xxx 99q
fvg 87 xxx 55h")
y <- x
y[1,1] <- "bbb"
anti_join(x, y)
删除x中与y匹配的所有观测值。
df1 <- data.frame(
stringsAsFactors = FALSE,
Col1 = c("aaa", "tfe", "eed", "wes", "rtf", "fvg"),
Col2 = c(12L, 21L, 12L, 95L, 44L, 87L),
Col3 = c("xxx", "xxx", "xxx", "xxx", "xxx", "xxx"),
Col4 = c("32b", "14f", "54b", "54r", "99q", "55h")
)
df2 <- data.frame(
stringsAsFactors = FALSE,
Col1 = c("a", "tfe", "ee", "ws", "rt", "fvg"),
Col2 = c(12L, 21L, 12L, 95L, 44L, 87L),
Col3 = c("xxx", "xxx", "xxx", "xxx", "xxx", "xxx"),
Col4 = c("32b", "14f", "54b", "54r", "99q", "55h")
)
library(dplyr)
unique1 <- df1 %>%
anti_join(df2)
unique2 <- df2 %>%
anti_join(df1)
# Join them together:
rbind(unique1, unique2)
#> Col1 Col2 Col3 Col4
#> 1 aaa 12 xxx 32b
#> 2 eed 12 xxx 54b
#> 3 wes 95 xxx 54r
#> 4 rtf 44 xxx 99q
#> 5 a 12 xxx 32b
#> 6 ee 12 xxx 54b
#> 7 ws 95 xxx 54r
#> 8 rt 44 xxx 99q
由reprex软件包(v0.3.0(于2021-03-16创建