r - 在矩阵中查找公共链接并按公共交集进行分类



>假设我有一个距离成本矩阵,其中命运成本和原产地成本都需要低于某个阈值金额 - 例如,US 100 - 才能共享链接。我的困难在于对这些地方进行分类后实现一个共同的集合:A1 链接(命运成本和原点低于阈值)与 A2 和(同一件事)A3 和 A4;A2 与 A1 和 A4 链接;A4 与 A1 和 A2 链接。因此,A1,A2和A4将被归类为同一组,因为它们之间的联系频率最高。下面我设置一个矩阵作为示例:

A1  A2  A3  A4  A5  A6  A7
A1  0   90  90  90  100 100 100
A2  80  0   90  90  90  110 100
A3  80  110 0   90  120 110 90
A4  90  90  110 0   90  100 90
A5  110 110 110 110 0   90  80
A6  120 130 135 100 90  0   90
A7  105 110 120 90  90  90  0

我正在使用 Stata 对此进行编程,我没有像mata那样以矩阵形式放置上面的矩阵。列出字母 A 加数字的列是具有矩阵行名的变量,其余列以每个地点名称命名(例如 A1 等)。

我使用以下代码返回了每个地方之间的链接列表,也许我非常"蛮力"地这样做,因为我很着急:

clear all
set more off
//inputting matrix
input A1 A2 A3 A4 A5 A6 A7
0 90 90 90 100 100 100
80 0 90 90 90 100 100
80 110 0 90 120 110 90
90 90 110 0 90 100 90
110 110 110 110 0 90 90
120 130 135 100 90 0 90
105 110 120 90 90 90 0
end
//generate row variable
gen locality=""
forv i=1/7{
replace locality="A`i'" in `i'
}
*
order locality, first

//generating who gets below the threshold of 100
forv i=1/7{
gen r_`i'=0
replace r_`i'=1 if A`i'<100 & A`i'!=0
}
*
//checking if both ways (origin and destiny below threshold)
forv i=1/7{
gen check_`i'=.
forv j=1/7{
local v=r_`i'[`j']
local vv=r_`j'[`i']
replace check_`i'=`v'+`vv' in `j'
}
*
}
*
//creating list of links
gen locality_x=""
forv i=1/7{
preserve
local name = locality[`i']
keep if check_`i'==2
replace locality_x="`name'"
keep locality locality_x
save "C:UsersuserDesktoptemp_`i'", replace
restore
}
*
use "C:UsersuserDesktoptemp_1", clear
forv i=2/7{
append using "C:UsersuserDesktoptemp_`i'"
}
*
//now locality_x lists if A.1 has links with A.2, A.3 etc. and so on.
//the dificulty lies in finding a common intersection between the groups.

这将返回以下列表:

locality_x  locality
A1  A2
A1  A3
A1  A4
A2  A1
A2  A4
A3  A1
A4  A1
A4  A2
A4  A7
A5  A6
A5  A7
A6  A5
A6  A7
A7  A4
A7  A5
A7  A6

我正在尝试熟悉设置交叉点,但我不知道如何在 Stata 中做到这一点。我想做一些事情,我可以重新编程阈值并找到公共集。如果您能在 R 中生成一个解决方案,我将不胜感激,因为我可以在其中编程。


R 中获取列表的类似方法(如@user2957945下面的回答所示):

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))
# get values less than threshold
id = m < 100 
# make sure both values are less than threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 
Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6     

我还添加了"图论"标签,因为我相信这不完全是一个交集问题,我可以在向量中转换列表并在 R 中使用intersect函数。代码需要生成一个新 id,其中某些位置必须位于同一个新 id(组)中。如上例所示,如果 A.1 集具有 A.2 和 A.4,A.2 具有 A.1 和 A.4,A.4 具有 A.1 和 A.2,则这三个位置必须位于同一 id(组)中换句话说,我需要每个地方的最大交叉点分组。我知道不同的矩阵可能存在问题,例如 A.1 有 A.2 和 A.6,A.2 有 A.1 和 A.6,A.6 有 A.1 和 A.2(但 A.6 没有 A.4,考虑到上面的第一个例子)。在这种情况下,我欢迎将 A.6 添加到分组或其他任意分组的解决方案,其中代码只是将第一个集合组合在一起,从列表中删除 A.1、A.2 和 A.4,并使 A.6 没有新的分组。

在 R 中你可以做

# get values less then threshold
id = m < 100 
# make sure both values are less then threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 
Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6

.

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))

假设你想要的是最大的完整子图,你可以使用 igraph 包:

# Load necessary libraries
library(igraph)
# Define global parameters
threshold <- 100
# Compute the adjacency matrix
# (distances in both directions need to be smaller than the threshold)
am <- m < threshold & t(m) < threshold
# Make an undirected graph given the adjacency matrix
# (we set diag to FALSE so as not to draw links from a vertex to itself)
gr <- graph_from_adjacency_matrix(am, mode = "undirected", diag = FALSE)
# Find all the largest complete subgraphs
lc <- largest_cliques(gr)
# Output the list of complete subgraphs as a list of vertex names
lapply(lc, (function (e) e$name))

据我所知,Stata中没有类似的功能。但是,如果您正在寻找最大的连接子图(在您的例子中是整个图),那么您可以在 Stata 中使用聚类命令(即clustermat)。

最新更新