我想找到列的唯一值,但拿走指定向量中的值。在下面的示例数据中,我想从列all_areas
减去向量area1
和area2
中的值中找到唯一的值。即结果应该是"城镇"、"城市"、"村庄">
set.seed(1)
area_df = data.frame(all_areas = sample(rep(c("foo", "bar", "big", "small", "town", "city", "village"),5),20),
number = sample(1:100, 20))
area1 = c("foo", "bar")
area2 = c("big", "small")
您可以使用函数setdiff
来查找all_areas
与area1
和area2
之间的集合差值:
setdiff(area_df$all_areas, c(area1, area2))
[1] "city" "village" "town"
我们可以使用%in%
创建一个逻辑向量,将'all_areas'中的其他元素(!
)否定到subset
,然后使用unique
返回唯一的行
unique(subset(area_df, !all_areas %in% c(area1, area2)))
与产出
all_areas number
5 village 44
7 city 33
8 town 84
9 city 35
10 village 70
11 town 74
16 village 87
19 town 40
20 village 93
使用dplyr
方法:
library(dplyr)
area_df %>%
filter(!all_areas %in% c(area1, area2)) %>%
distinct
#> all_areas number
#> 1 village 44
#> 2 city 33
#> 3 town 84
#> 4 city 35
#> 5 village 70
#> 6 town 74
#> 7 village 87
#> 8 town 40
#> 9 village 93