在R中导出并行处理中的函数输入

我正在尝试编写一个具有并行计算选项的函数。为了使它在windows、mac或Linux环境中都能工作，我使用了PSOCK系统，我认为这是makeCluster()中的默认配置。我的问题是，我是否应该，或者更可取的是，使用clusterExport函数将所有参数传递给集群。如果我这样做，我想我需要评估所有的输入参数，而不是默认的惰性评估。如果某些变量只在某些特殊情况下使用，这似乎是不可取的。

例如，在下面的代码中，我想知道是否应该添加函数中的CCD_ 3。以下代码在我的计算机上运行良好，但在其他计算机上类似的代码失败了。

我也很想听听最佳实践。非常感谢。

library(pbapply)
foo = function(a=3, b=4, c=5, B = 8, parallel = FALSE){
if(parallel) {cl = makeCluster(4) } else{cl = NULL}

# default a,b,c values are desired to be used 
if(a>5){
# c is used only in this case 
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+c+i},cl = cl) 
}else{
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+i},cl = cl) 
}
if(parallel) stopCluster(cl)
return(v)
}
system.time(foo())
system.time(foo(parallel = T))

您可以尝试将默认值设置为NULL，并使用sapply进行案例处理。不过，我不确定这是否真的有效，因为我无法重现你的错误。

foo <- function(a=NULL, b=NULL, c=NULL, B=NULL, parallel=FALSE) {
if (parallel) {
cl <- makeCluster(detectCores() - 4)  ## safer to code this more dynamically
## case handling:
sapply(c("a", "b", "c", "B"), function(x) {
if (!is.null(get(x))) clusterExport(cl, x, environment())
})
} else { 
cl <- NULL
}
# default a,b,c values are desired to be used 
if (a > 5) {
# c is used only in this case 
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + c + i
}, cl=cl) 
} else {
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + i
}, cl=cl) 
}
if (parallel) stopCluster(cl)
return(v)
}
foo(a=3, b=4, c=5, B=8, parallel=TRUE)
#   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
# [1]  8  9 10 11 12 13 14 15

相关内容

最新更新

热门标签：