我有两个矩阵。我想使用第一个列来过滤第二个列,然后找到过滤的集合的总和。我使用了以下代码,并且效果很好。
apply(firstMat,2,function(x) sum(secondMat[x,x]))
但是,数据集很大,我想找到一种使过程更快的替代方法。
这是一个小规模的可再现示例:
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
,如果您能帮助我,我真的很感激。
也许您的Blas比显式循环快:
diag( t(firstMat) %*% secondMat %*% firstMat )
您可以在多个群集上并行运行apply
函数
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster
# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))
# Option 2: Using aaply from package `plyr`
library(plyr)
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)
stopCluster(cl)
使用小的可重现示例,它没有显示任何速度的改进,但是我希望这两个选项都比大型矩阵的apply
快