我有一个元素是整数的列表,如果这些元素至少共享一个值,我想累加这些元素。对于那些与其他元素没有任何价值观的元素,我希望它们保持原样。这是我的样品日期:
x <- list(c(1, 2), c(1, 2, 3), c(2, 3, 4), c(3, 4, 5), c(4, 5, 8), c(6, 9, 7), 7, c(5, 8), 10, 11)
这是我想要的输出:
desired_reult <- list(c(1, 2, 3, 4, 5, 8),
c(6, 9, 7),
10,
11)
我想首先使用purrr
中的reduce
或accumulate
函数来完成此操作,但欢迎使用任何其他tidyverse
解决方案。到目前为止,我已经尝试过这个解决方案,但它只给了我一个union
,显然放弃了其余的:
x %>%
reduce(~ if(any(.x %in% .y)) union(.x, .y) else .x)
[1] 1 2 3 4 5 8
一般来说,我正在寻找一种像聚类一样将具有公共值的整数(id(分组的方法,但不幸的是,到目前为止,我的努力都白费了。
非常感谢你事先的帮助。
我怀疑有一个集合覆盖解决方案,但在此期间,这里有一种图形方法:
首先,让我们将整数向量转换为边列表,这样就可以将其制成图。我们可以使用expand.grid
。
library(igraph)
edgelist <- do.call(rbind,lapply(x,(x)expand.grid(x,x))) #R version >= 4.1.0
现在我们有一个两列的data.frame,显示所有整数(一组边(之间的连接。
CCD_ 7可以很方便地用它绘制图形。
从那里我们可以使用igraph::components
来提取连通分量。
g <- graph.data.frame(edgelist)
split(names(components(g)$membership),components(g)$membership)
#$`1`
#[1] "1" "2" "3" "4" "5" "8"
#$`2`
#[1] "6" "9" "7"
#$`3`
#[1] "10"
#$`4`
#[1] "11"
或使用Tidyverse:
library(dplyr); library(purrr)
map_dfr(x, ~expand.grid(.x,.x)) %>%
graph.data.frame() %>%
components() %>%
pluck(membership) %>%
stack() %>%
{split(as.numeric(as.character(.[,2])),.[,1])}
$`1`
[1] 1 2 3 4 5 8
$`2`
[1] 6 9 7
$`3`
[1] 10
$`4`
[1] 11
一种方法:
i与j相邻,iff与(i,j(相交!=空集。我们想找到矩阵的连通分量,在位置(i,j(上有1,当集i与集j相邻时,否则为0。前4行构建邻接矩阵,第5行和第6行查找连接的组件,其余的是基于该成员关系拆分列表并取唯一值。
library(tidyverse)
library(igraph)
map(x, function(a) map_int(x, ~length(base::intersect(a, .x)) > 0) * 1L) %>%
reduce(rbind) %>%
graph.adjacency() %>%
as.undirected() %>%
components() %>%
pluck("membership") %>%
split(seq_along(.), .) %>%
map(~unique(unlist(x[.x])))
感谢我亲爱的朋友@Ian Canmpbell介绍的一篇内容丰富的帖子,我想挑战自己,为此编写一个自定义函数。它仍然是第一个版本,虽然不是很优雅,当然可以有很大的改进,但目前它是稳定的,因为我在一些输入上尝试了它,它没有让人失望。
anoush <- function(x) {
# First we check whether x is a list
stopifnot(is.list(x))
# Then we take every element of the input and calculate the intersect between
# that element & others. In case there were some we would store the indices
# in `vec` vector. So in the end we have a list called `ind` whose elements
# are all the indices connected with the corresponding elements of the original
# list for example first element of `ind` is `1`, `2`, `3` which means in
# the original list these elements have common values.
ind <- lapply(1:length(x), function(a) {
vec <- c()
for(i in 1:length(x)) {
if(length(unique(base::intersect(x[[a]], x[[i]]))) > 0) {
vec <- c(vec, i)
}
}
vec
})
# Then we go on to again compare each element of `ind` with other elements
# in case there were any intersect, we will calculate the `union` of them.
# for each element we will end up with a list of accumulated values but
# but in the end we use `Reduce` to capture only the last one. So for each
# element of `ind` we end up having a collection of indices that also
# result in duplicated values. For example elements `1` through `5` of
# `dup_ind` contains the same value cause in the original list these
# elements have common values.
dup_ind <- lapply(1:length(ind), function(a) {
out <- c()
for(i in 1:length(ind)) {
if(length(unique(base::intersect(ind[[a]], ind[[i]]))) > 0) {
out[[i]] <- union(ind[[a]], ind[[i]])
}
vec2 <- Reduce("union", out)
}
vec2
})
# Here we get rid of the duplicated elements of the list by means of
# `relist` funciton and since in this process all the duplicated elements
# will turn to `integer(0)` I have filtered those out.
un <- unlist(dup_ind)
res <- Map(`[`, dup_ind, relist(!duplicated(un), skeleton = dup_ind))
res2 <- Filter(length, res)
sapply(res2, function(a) unique(unlist(lapply(a, function(b) `[[`(x, b)))))
}
输出
> anoush(x)
[[1]]
[1] 1 2 3 4 5 8
[[2]]
[1] 6 9 7
[[3]]
[1] 10
[[4]]
[1] 11