我在R中有一个5000 x 1000的字符矩阵,每个条目都是一种颜色(红色、蓝色、黄色、绿色等(。我想以成对的方式计算所有列中矩阵每行之间匹配颜色(字符串(的频率。1000列中的每一列都呈现不同的颜色标签迭代,对每列不同标签的数量没有限制。例如,第一列可能有8个不同的颜色标签,而第二列有10个,第三列有11个,等等。我对标签本身不感兴趣,只有一对行在每列中匹配或不匹配的频率。
例如,我的字符矩阵看起来像这样(没有人为的定期重复的颜色模式(:
colors <- sample(c("grey", "green", "blue", "pink", "brown", "purple", "cyan", "red", "yellow"), 8, replace = TRUE)
labels <- matrix(rep(colors), nrow = 10, ncol = 5)
labels
[,1] [,2] [,3] [,4] [,5]
[1,] "brown" "purple" "yellow" "green" "brown"
[2,] "grey" "red" "brown" "red" "grey"
[3,] "purple" "yellow" "green" "brown" "purple"
[4,] "red" "brown" "red" "grey" "red"
[5,] "yellow" "green" "brown" "purple" "yellow"
[6,] "brown" "red" "grey" "red" "brown"
[7,] "green" "brown" "purple" "yellow" "green"
[8,] "red" "grey" "red" "brown" "red"
[9,] "brown" "purple" "yellow" "green" "brown"
[10,] "grey" "red" "brown" "red" "grey"
我想用它来构造一个5000 x 5000平方的对称矩阵,它对应于行之间成对匹配的频率。每个条目[i,j](以及[j,i](都应该是所有列中第i行和第j行之间匹配的频率。例如,在上面的玩具标签矩阵中,第1行在第1列和第5列中与第6行匹配,但在其他列中不匹配,所以我希望匹配频率(2/5=0.4(是"1"的条目[1,6]和[6,1];频率矩阵";。对角线将全部为1,因为每一行总是与其自身匹配。类似这样的输出:
freq.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0.4 0 0 1 0
[2,] 0 1 0 0 0.2 0.4 0 0 0 1
[3,] 0 0 1 0 0 0 0 0.2 0 0
[4,] 0 0 0 1 0 0 0.2 0.6 0 0
[5,] 0 0.2 0 0 1 0 0 0 0 0.2
[6,] 0.4 0.4 0 0 0 1 0 0 0.4 0.4
[7,] 0 0 0 0.2 0 0 1 0 0 0
[8,] 0 0 0.2 0.6 0 0 0 1 0 0
[9,] 1 0 0 0 0 0.4 0 0 1 0
[10,] 0 1 0 0 0.2 0.4 0 0 0 1
我尝试应用rowSums函数如下:
freq.mat <- apply(labels, 1, function(x) rowSums(x == labels))
diag(freq.matrix) <- 1
freq.matrix / 10
它生成了一个大小合适的矩阵,但条目并不是我所希望的成对行匹配频率。我还修改了一些嵌套的for循环,但没有取得多大进展,这也让我感觉非常";违背精神;R编程。
有人能帮我指一下正确的方向吗?非常感谢!
您正在比较错误的值:
apply(labels, 1, function(x) colMeans(x == t(labels)))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[2,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0
[3,] 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0
[4,] 0.0 0.0 0.0 1.0 0.0 0.0 0.2 0.6 0.0 0.0
[5,] 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2
[6,] 0.4 0.4 0.0 0.0 0.0 1.0 0.0 0.0 0.4 0.4
[7,] 0.0 0.0 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0
[8,] 0.0 0.0 0.2 0.6 0.0 0.0 0.0 1.0 0.0 0.0
[9,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[10,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0