r-基于至少一个公共值对Id进行分组

我有一个元素是整数的列表，如果这些元素至少共享一个值，我想累加这些元素。对于那些与其他元素没有任何价值观的元素，我希望它们保持原样。这是我的样品日期：

x <- list(c(1, 2), c(1, 2, 3), c(2, 3, 4), c(3, 4, 5), c(4, 5, 8), c(6, 9, 7), 7, c(5, 8), 10, 11)

这是我想要的输出：

desired_reult <- list(c(1, 2, 3, 4, 5, 8), 
c(6, 9, 7), 
10, 
11)

我想首先使用purrr中的reduce或accumulate函数来完成此操作，但欢迎使用任何其他tidyverse解决方案。到目前为止，我已经尝试过这个解决方案，但它只给了我一个union，显然放弃了其余的：

x %>% 
reduce(~ if(any(.x %in% .y)) union(.x, .y) else .x)
[1] 1 2 3 4 5 8

一般来说，我正在寻找一种像聚类一样将具有公共值的整数(id(分组的方法，但不幸的是，到目前为止，我的努力都白费了。

非常感谢你事先的帮助。

我怀疑有一个集合覆盖解决方案，但在此期间，这里有一种图形方法：

首先，让我们将整数向量转换为边列表，这样就可以将其制成图。我们可以使用expand.grid。

library(igraph)
edgelist <- do.call(rbind,lapply(x,(x)expand.grid(x,x))) #R version >= 4.1.0

现在我们有一个两列的data.frame，显示所有整数(一组边(之间的连接。

CCD_ 7可以很方便地用它绘制图形。

从那里我们可以使用igraph::components来提取连通分量。

g <- graph.data.frame(edgelist)
split(names(components(g)$membership),components(g)$membership)
#$`1`
#[1] "1" "2" "3" "4" "5" "8"
#$`2`
#[1] "6" "9" "7"
#$`3`
#[1] "10"
#$`4`
#[1] "11"

或使用Tidyverse:

library(dplyr); library(purrr)
map_dfr(x, ~expand.grid(.x,.x)) %>%
graph.data.frame() %>%
components() %>% 
pluck(membership) %>%
stack() %>%
{split(as.numeric(as.character(.[,2])),.[,1])}
$`1`
[1] 1 2 3 4 5 8
$`2`
[1] 6 9 7
$`3`
[1] 10
$`4`
[1] 11

一种方法：

i与j相邻，iff与(i，j(相交！=空集。我们想找到矩阵的连通分量，在位置(i，j(上有1，当集i与集j相邻时，否则为0。前4行构建邻接矩阵，第5行和第6行查找连接的组件，其余的是基于该成员关系拆分列表并取唯一值。

library(tidyverse)
library(igraph)
map(x, function(a) map_int(x, ~length(base::intersect(a, .x)) > 0) * 1L) %>% 
reduce(rbind) %>%
graph.adjacency() %>%
as.undirected() %>%
components() %>%
pluck("membership") %>%
split(seq_along(.), .) %>%
map(~unique(unlist(x[.x])))

感谢我亲爱的朋友@Ian Canmpbell介绍的一篇内容丰富的帖子，我想挑战自己，为此编写一个自定义函数。它仍然是第一个版本，虽然不是很优雅，当然可以有很大的改进，但目前它是稳定的，因为我在一些输入上尝试了它，它没有让人失望。

anoush <- function(x) {
# First we check whether x is a list
stopifnot(is.list(x)) 
# Then we take every element of the input and calculate the intersect between
# that element & others. In case there were some we would store the indices 
# in `vec` vector. So in the end we have a list called `ind` whose elements 
# are all the indices connected with the corresponding elements of the original 
# list for example first element of `ind` is `1`, `2`, `3` which means in 
# the original list these elements have common values.

ind <- lapply(1:length(x), function(a) {
vec <- c()
for(i in 1:length(x)) {
if(length(unique(base::intersect(x[[a]], x[[i]]))) > 0) {
vec <- c(vec, i)
}
}
vec 
})
# Then we go on to again compare each element of `ind` with other elements
# in case there were any intersect, we will calculate the `union` of them.
# for each element we will end up with a list of accumulated values but
# but in the end we use `Reduce` to capture only the last one. So for each
# element of `ind` we end up having a collection of indices that also 
# result in duplicated values. For example elements `1` through `5` of 
# `dup_ind` contains the same value cause in the original list these 
# elements have common values.
dup_ind <- lapply(1:length(ind), function(a) {
out <- c()
for(i in 1:length(ind)) {
if(length(unique(base::intersect(ind[[a]], ind[[i]]))) > 0) {
out[[i]] <- union(ind[[a]], ind[[i]])
}
vec2 <- Reduce("union", out)
}
vec2
}) 
# Here we get rid of the duplicated elements of the list by means of 
# `relist` funciton and since in this process all the duplicated elements
# will turn to `integer(0)` I have filtered those out.

un <- unlist(dup_ind)
res <- Map(`[`, dup_ind, relist(!duplicated(un), skeleton = dup_ind))
res2 <- Filter(length, res)

sapply(res2, function(a) unique(unlist(lapply(a, function(b) `[[`(x, b)))))

}

输出

> anoush(x)
[[1]]
[1] 1 2 3 4 5 8
[[2]]
[1] 6 9 7
[[3]]
[1] 10
[[4]]
[1] 11

相关内容

最新更新

热门标签：