一个矩阵cov_mat
存储变量之间的协方差:
a_plane a_boat a_train b_plane b_boat b_train c_plane c_boat c_train d_plane …
a_plane 4.419 -0.583 0.446 0.018 -1.291 3.159 -0.954 0.488 3.111 1.100
a_boat -0.583 2.636 1.813 -1.511 -0.420 -0.757 1.698 1.668 1.091 0.120
a_train 0.446 1.813 2.668 -0.365 -0.183 1.040 1.347 1.813 0.806 -0.324
b_plane 0.018 -1.511 -0.365 2.498 1.153 1.498 -0.465 -1.157 -0.775 0.133
b_boat -1.291 -0.420 -0.183 1.153 1.043 -0.194 0.243 -0.361 -0.981 -0.040
b_train 3.159 -0.757 1.040 1.498 -0.194 4.153 -0.208 0.257 1.922 1.434
c_plane -0.954 1.698 1.347 -0.465 0.243 -0.208 1.791 0.909 0.259 0.394
c_boat 0.488 1.668 1.813 -1.157 -0.361 0.257 0.909 2.290 1.572 0.269
c_train 3.111 1.091 0.806 -0.775 -0.981 1.922 0.259 1.572 4.097 2.001
d_plane 1.100 0.120 -0.324 0.133 -0.040 1.434 0.394 0.269 2.001 2.231
…
final_need
是一个数据框,其中每个运输类别(飞机、轮船、火车(都有一行,给定类别中每个可能的协方差都有一列:
aa ab ac ad ba bb bc bd ca cb … <dd>
plane 4.419 0.018 -0.954 1.100 0.018 2.498 -0.465 0.133 -0.954 -0.465 …
boat 2.636 -0.420 1.668 0.120 -0.420 1.043 -0.361 …
train …
<…>
为了从cov_mat
到final_need
,我已经通过igraph, then eliminated rows of that edgelist that included out-of-category covariance calculations (e.g.,
a_planecovaries with
a_boat'将文件转换为边缘列表,但我可以不在乎(。结果如下:
> head(cov_edgelist_slim)
from to covariance
a_plane a_plane 4.419
a_plane b_plane 0.018
a_plane c_plane -0.954
a_plane d_plane 1.100
b_plane a_plane …
… … …
然后我尝试使用reshape2
中的dcast()
,但卡在如何使用该函数产生final_need
结果上。有什么想法吗?如果有比我要走的更简单的方法,我很高兴听到它!
以下是使用基本 R 的另一种方法:
- 提取每辆车的协方差子矩阵;
- 将此子矩阵扩展为向量;
- 将(行(向量组合回矩阵。
附加代码是根据将前缀的组合粘贴在一起来获取final_need
的正确列名。
## sort row + colnames in alphabetical order
cov_mat <- cov_mat[sort(rownames(cov_mat)), sort(colnames(cov_mat))]
## unique prefix and vehicle names
prefix <- unique(sub("_\w+$", "", colnames(cov_mat)))
vehicNames <- unique(sub("^\w+?_", "", colnames(cov_mat)))
## create final_need
final_need <- t(sapply(vehicNames, function(vehic) {
indices <- grep(vehic, colnames(cov_mat))
as.vector(cov_mat[indices, indices])
}))
## add prefix combinations as column names
colnames(final_need) <- levels(interaction(prefix, prefix, sep = ""))
final_need
#> aa ba ca ab bb cb ac bc cc
#> boat 2.636 -0.420 1.668 -0.420 1.043 -0.361 1.668 -0.361 2.290
#> plane 4.419 0.018 -0.954 0.018 2.498 -0.465 -0.954 -0.465 1.791
#> train 2.668 1.040 0.806 1.040 4.153 1.922 0.806 1.922 4.097
编辑:也可以反过来做同样的事情,即提取每个前缀组合的协方差子矩阵,并将它们的对角线组合回矩阵(按列(。
## create final_need by column
final_need <- apply(expand.grid(prefix, prefix), 1, function(i) {
row_ids <- grep(sprintf("^%s_", i[1]), rownames(cov_mat))
col_ids <- grep(sprintf("^%s_", i[2]), colnames(cov_mat))
cov_mat[cbind(row_ids, col_ids)]
})
## add row + column names
dimnames(final_need) <- list(vehicNames, levels(interaction(prefix, prefix, sep = "")))
final_need
#> aa ba ca ab bb cb ac bc cc
#> boat 2.636 -0.420 1.668 -0.420 1.043 -0.361 1.668 -0.361 2.290
#> plane 4.419 0.018 -0.954 0.018 2.498 -0.465 -0.954 -0.465 1.791
#> train 2.668 1.040 0.806 1.040 4.153 1.922 0.806 1.922 4.097
tidyverse
的方法可能是将行名作为列,将数据转换为长格式。从两个列名以及行名中获取字符串的第一部分,仅保留两者匹配的行并转换为宽格式。
library(tidyverse)
df %>%
rownames_to_column() %>%
pivot_longer(cols =-rowname) %>%
mutate(key = paste0(sub("_.*", "", rowname), sub("_.*", "", name)),
rowname = sub(".*_", "", rowname), name = sub(".*_", "", name)) %>%
filter(rowname == name) %>%
select(-rowname) %>%
pivot_wider(names_from = key, values_from = value)
# A tibble: 3 x 17
# name aa ab ac ad ba bb bc bd ca cb ....
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 plane 4.42 0.018 -0.954 1.1 0.018 2.50 -0.465 0.133 -0.954 -0.465
#2 boat 2.64 -0.42 1.67 NA -0.42 1.04 -0.361 NA 1.67 -0.361
#3 train 2.67 1.04 0.806 NA 1.04 4.15 1.92 NA 0.806 1.92