r-根据索引行是相似还是不同填写矩阵

  • 本文关键字:是不同 相似 索引 r matrix
  • 更新时间 :
  • 英文 :


我在R中有一个非常大的成对距离矩阵。我想根据行/列名是相同还是不同来对矩阵中的单元格进行编码。

在较小的范围内,行/列名为:

individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")

除了appleapple的比较之外,我希望每个涉及apple的比较都有一个1的矩阵。这看起来像:

[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0"  "1"  "1"  "1"  "1"  "1" 
[2,] "1"  "0"  "1"  "0"  "0"  "1" 
[3,] "1"  "1"  "0"  "1"  "1"  "1" 
[4,] "1"  "0"  "1"  "0"  "0"  "1" 
[5,] "1"  "0"  "1"  "0"  "0"  "1" 
[6,] "1"  "1"  "1"  "1"  "1"  "0" 

我知道我可以做到这一点:

final.matrix <- matrix(nrow= length(individuals), ncol = length(individuals))
final.matrix[grep("apples", individuals),] <- 1
final.matrix[,grep("apples", individuals)] <- 1
diag(final.matrix) <- 0
final.matrix[is.na(final.matrix)] <- 0

但必须有一个更干净/更简单的方法。我错过了什么?

此外,当行/列名是tibble时,这是不起作用的,这就是它们在现实中的情况。建议使用tibbles的解决方案?

tibble_inds <- as_tibble(individuals)
grep("apple", tibble_inds)
# 1

听起来你想要

outer(x, x, function(a, b) as.integer(a + b == 1L))

其中

x <- tibble_inds[[1L]] == "apple"

如果您只接受"apple"

x <- grepl("apple", tibble_inds[[1L]])

如果您接受任何具有"apple"作为子字符串的字符串。

我假设您的字符向量individualstibble_inds中的第一个变量。在这种情况下,outer返回

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    0    1    1    0
## [2,]    1    0    1    0    0    1
## [3,]    0    1    0    1    1    0
## [4,]    1    0    1    0    0    1
## [5,]    1    0    1    0    0    1
## [6,]    0    1    0    1    1    0

对于CCD_ 10的两种选择。此结果与您的结果不匹配,因为您的diag<-调用未命中[1,3][3,1][3,6][6,3][1,6][6,1]

另一种可能的解决方案:

individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
m <- matrix(0, length(individuals), length(individuals))
for (i in 1:length(individuals))
for (j in 1:length(individuals))
m[i, j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)
m
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    1    0    1    1    0
#> [2,]    1    0    1    0    0    1
#> [3,]    0    1    0    1    1    0
#> [4,]    1    0    1    0    0    1
#> [5,]    1    0    1    0    0    1
#> [6,]    0    1    0    1    1    0

或者用嵌套sapply:替换嵌套for循环

m <- matrix(0, length(individuals), length(individuals))
sapply(1:length(individuals), (i) sapply(1:length(individuals),
(j) m[i,j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)))
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    1    0    1    1    0
#> [2,]    1    0    1    0    0    1
#> [3,]    0    1    0    1    1    0
#> [4,]    1    0    1    0    0    1
#> [5,]    1    0    1    0    0    1
#> [6,]    0    1    0    1    1    0

我们可以像下面的一样尝试outer

> x <- grepl("apple",individuals)
> +(outer(x, x, `+`) == 1)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    1    0    1    1    0
[2,]    1    0    1    0    0    1
[3,]    0    1    0    1    1    0
[4,]    1    0    1    0    0    1
[5,]    1    0    1    0    0    1
[6,]    0    1    0    1    1    0

最新更新