我的代码正在处理的实际数据帧比这个大得多,它需要能够处理不同的数据帧。下面的例子说明了具有相同内容的组的问题,以及如何只保留其中一个组。
考虑一下我有不同内容的组的情况。
Group Contents
GroupA Marble
GroupB Marble
GroupB Granite
GroupC Marble
GroupD Granite
GroupD Glass
GroupD Marble
在上面的例子中,GroupA和GroupC都只包含大理石,所以我想删除其中一个组。我想要的输出:
Group Contents
GroupA Marble
GroupB Marble
GroupB Granite
GroupD Granite
GroupD Glass
GroupD Marble
可再现数据:
structure(list(Group = c("GroupA", "GroupB", "GroupB", "GroupC",
"GroupD", "GroupD", "GroupD"), Contents = c("Marble", "Marble",
"Granite", "Marble", "Granite", "Glass", "Marble")), class = "data.frame", row.names = c(NA,
-7L), spec = structure(list(cols = list(Group = structure(list(), class = c("collector_character",
"collector")), Contents = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
您可以尝试:
idx <- !duplicated(with(df, cbind(Contents, ave(Contents, Group, FUN = function(x) toString(sort(x))))))
df[idx, ]
Group Contents
1 GroupA Marble
2 GroupB Marble
3 GroupB Granite
5 GroupD Granite
6 GroupD Glass
7 GroupD Marble
这里有一个通过嵌套aggregate
的选项
df[df$Group %in% aggregate(Group~.,aggregate(.~Group,df,toString),head,1)$Group,]
- 输出
Group Contents
1 GroupA Marble
2 GroupB Marble
3 GroupB Granite
5 GroupD Granite
6 GroupD Glass
7 GroupD Marble
dplyr
中带有distinct
的选项
library(dplyr)
df %>%
arrange(across(everything())) %>%
group_by(Group) %>%
mutate(new = toString(Contents)) %>%
ungroup %>%
distinct(Contents, new, .keep_all = TRUE) %>%
select(-new)
-输出
# A tibble: 6 x 2
# Group Contents
# <chr> <chr>
#1 GroupA Marble
#2 GroupB Granite
#3 GroupB Marble
#4 GroupD Glass
#5 GroupD Granite
#6 GroupD Marble
您可以使用dplyr
过滤函数:-
df<-structure(list(Group = c("GroupA", "GroupB", "GroupB", "GroupC",
"GroupD", "GroupD", "GroupD"), Contents = c("Marble", "Marble",
"Granite", "Marble", "Granite", "Glass", "Marble")), class = "data.frame", row.names = c(NA,
-7L), spec = structure(list(cols = list(Group = structure(list(), class = c("collector_character",
"collector")), Contents = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
df
df%>%
filter(Group!="GroupC")
Group Contents
1 GroupA Marble
2 GroupB Marble
3 GroupB Granite
4 GroupD Granite
5 GroupD Glass
6 GroupD Marble