r语言 - 删除每个水平的因子少于 5 个观测值的列 - r - Remove columns with factors that has less than 5 observations per level 小贝子编程网

我有一个由 100 多列组成的数据集，所有列都是因子类型。前任：

animal               fruit               vehicle              color 
cat              orange                   car               blue 
dog               apple                   bus              green 
dog               apple                   car              green 
dog              orange                   bus              green

在我的数据集中，我需要删除所有因子的列，每个水平的观测值少于 5 个。在此示例中，如果我想删除每个水平的观测值小于或等于1的所有列，例如blue或cat，算法将删除列animal和color。最优雅的方法是什么？

我们可以将Filter与table一起使用

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

数据

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

我们可以使用dplyr中的select_if

library(dplyr)
df1 %>% select_if(~all(table(.) > 1))
#   fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

r语言 - 删除每个水平的因子少于 5 个观测值的列

数据

相关内容

最新更新

热门标签：