获取R中ego和alter的共同点的数量



我有一个青少年友谊的定向网络数据集。我想做一个边缘列表,包括ego和alter共同拥有的朋友数量(某人ego和olt都被提名为朋友(。以下是一些示例数据:

有数据:

id   alter
1      3
1      5
1      9
2      3
2      5
3      2
3      5
3      9
3      6

需要数据:

id   alter   num_common
1      3            2
1      5            0
1      9            0
2      3            1
2      5            0
3      2            1
3      5            0 
3      9            0
3      6            0

解决方案可以是将边缘列表转换为邻接矩阵(使用igraph包(,并通过其转置将其相乘,以计算共享邻居的数量:

el <- read.table(text= " id   alter
1      3
1      5
1      9
2      3
2      5
3      2
3      5
3      9
3      6", header =T)
g <- graph_from_edgelist(as.matrix(el), directed = T)
m <- get.adjacency(g, sparse = F)
m2 <- m %*% t(m)           

然后将生成的矩阵转换回边缘列表,并将其与原始数据集合并:

el2 <- reshape2::melt(m2)
dplyr::left_join(el, el2, by = c("id" = "Var1", "alter" = "Var2"))
id alter value
1  1     3     2
2  1     5     0
3  1     9     0
4  2     3     1
5  2     5     0
6  3     2     1
7  3     5     0
8  3     9     0
9  3     6     0

为了了解同一个朋友提名自我和改变的频率,使用t(m) %*% m而不是m %*% t(m)来改变关系的方向。要忽略方向,请在graph_from_edgelist函数中将directed参数设置为FALSE

这是一个可能的但不是很简单的解决方案:

# your dummy data
df <- data.table::fread("id   alter
1      3
1      5
1      9
2      3
2      5
3      2
3      5
3      9
3      6")
library(dplyr)
library(tidyr)
# all pairs vertically with pair ID
pairs_v <- combn(unique(c(df$id, df$alter)), 2) %>% 
dplyr::as_tibble() %>% 
tidyr::pivot_longer(cols = everything()) %>% 
dplyr::arrange(name) 
# number of comon friends per group ID
pairs_comp <- pairs_v %>% 
dplyr::left_join(df, by = c("value" = "id"))  %>% 
dplyr::count(name, alter) %>% 
dplyr::filter(n > 1 & !is.na(alter)) %>% 
dplyr::count(name) 
# all pairs horizontally with pair ID
pairs_h <-pairs_v %>% 
dplyr::group_by(name) %>% 
dplyr::mutate(G_ID = dplyr::row_number()) %>% 
tidyr::pivot_wider(names_from = G_ID, values_from = "value") 
# multiple left joins to get repeated comon friends for each direction of combination
df %>% 
dplyr::left_join(pairs_h, by = c("id" = "1", "alter" = "2")) %>% 
dplyr::left_join(pairs_comp) %>% 
dplyr::left_join(pairs_h, by = c("id" = "2", "alter" = "1")) %>% 
dplyr::left_join(pairs_comp, by = c("name.y" = "name")) %>% 
dplyr::mutate(num_common = case_when(!is.na(n.x) ~ as.numeric(n.x),
!is.na(n.y) ~ as.numeric(n.y),
TRUE ~ 0)) %>% 
dplyr::select(id, alter, num_common)
id alter num_common
1:  1     3          2
2:  1     5          0
3:  1     9          0
4:  2     3          1
5:  2     5          0
6:  3     2          1
7:  3     5          0
8:  3     9          0
9:  3     6          0

最新更新