R-如何使应用程序()函数更快



我有两个矩阵。我想使用第一个列来过滤第二个列,然后找到过滤的集合的总和。我使用了以下代码,并且效果很好。

apply(firstMat,2,function(x) sum(secondMat[x,x]))

但是,数据集很大,我想找到一种使过程更快的替代方法。

这是一个小规模的可再现示例:

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)

,如果您能帮助我,我真的很感激。

也许您的Blas比显式循环快:

diag( t(firstMat) %*% secondMat %*% firstMat )

您可以在多个群集上并行运行apply函数

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters 
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster
# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))
# Option 2: Using aaply from package `plyr`
library(plyr)    
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)
stopCluster(cl)

使用小的可重现示例,它没有显示任何速度的改进,但是我希望这两个选项都比大型矩阵的apply