r如果我有两个内容相同的独立组,如何删除其中一个



我的代码正在处理的实际数据帧比这个大得多,它需要能够处理不同的数据帧。下面的例子说明了具有相同内容的组的问题,以及如何只保留其中一个组。

考虑一下我有不同内容的组的情况。

Group   Contents
GroupA  Marble
GroupB  Marble
GroupB  Granite
GroupC  Marble
GroupD  Granite
GroupD  Glass
GroupD  Marble

在上面的例子中,GroupA和GroupC都只包含大理石,所以我想删除其中一个组。我想要的输出:

Group   Contents
GroupA  Marble
GroupB  Marble
GroupB  Granite
GroupD  Granite
GroupD  Glass
GroupD  Marble

可再现数据:

structure(list(Group = c("GroupA", "GroupB", "GroupB", "GroupC", 
"GroupD", "GroupD", "GroupD"), Contents = c("Marble", "Marble", 
"Granite", "Marble", "Granite", "Glass", "Marble")), class = "data.frame", row.names = c(NA, 
-7L), spec = structure(list(cols = list(Group = structure(list(), class = c("collector_character", 
"collector")), Contents = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec")) 

您可以尝试:

idx <- !duplicated(with(df, cbind(Contents, ave(Contents, Group, FUN = function(x) toString(sort(x))))))

df[idx, ]
Group Contents
1 GroupA   Marble
2 GroupB   Marble
3 GroupB  Granite
5 GroupD  Granite
6 GroupD    Glass
7 GroupD   Marble

这里有一个通过嵌套aggregate的选项

df[df$Group %in% aggregate(Group~.,aggregate(.~Group,df,toString),head,1)$Group,]
  • 输出
Group Contents
1 GroupA   Marble
2 GroupB   Marble
3 GroupB  Granite
5 GroupD  Granite
6 GroupD    Glass
7 GroupD   Marble

dplyr中带有distinct的选项

library(dplyr)
df %>% 
arrange(across(everything())) %>%
group_by(Group) %>%
mutate(new = toString(Contents)) %>%
ungroup %>%
distinct(Contents, new, .keep_all = TRUE) %>%
select(-new)

-输出

# A tibble: 6 x 2
#  Group  Contents
#  <chr>  <chr>   
#1 GroupA Marble  
#2 GroupB Granite 
#3 GroupB Marble  
#4 GroupD Glass   
#5 GroupD Granite 
#6 GroupD Marble  

您可以使用dplyr过滤函数:-

df<-structure(list(Group = c("GroupA", "GroupB", "GroupB", "GroupC", 
"GroupD", "GroupD", "GroupD"), Contents = c("Marble", "Marble", 
"Granite", "Marble", "Granite", "Glass", "Marble")), class = "data.frame", row.names = c(NA, 
                                                                                        -7L), spec = structure(list(cols = list(Group = structure(list(), class = c("collector_character", 
                                                                                                                                                                    "collector")), Contents = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                          "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                "collector")), skip = 1), class = "col_spec")) 
df
df%>%
filter(Group!="GroupC")

Group Contents
1 GroupA   Marble
2 GroupB   Marble
3 GroupB  Granite
4 GroupD  Granite
5 GroupD    Glass
6 GroupD   Marble

最新更新