R-如何定位并联回路中的故障(pblapply)

我在R中工作，并使用函数pblapply((进行并行处理。我喜欢这个函数，因为它显示了一个进度条(对于估计很长的执行时间非常有用(。

假设我有一个巨大的数据集，我把它分成500个子数据集。我将通过不同的线程共享它们以进行并行处理。但是，如果一个子数据集生成错误，那么整个pblapply((循环就会失败，我不知道500个子数据集中是哪个生成了错误。我必须逐一检查。当我用R基for((函数进行这样的循环时，我可以添加print(i)来帮助我定位错误。

Q(我可以用pblapply((做类似的事情吗，显示一个值来告诉我当前正在执行哪个子数据集(即使同时显示多个子数据集，因为不同的线程同时操作多个子数据集中(。这会节省我的时间。

# The example below generate an error, we can guess where because it's very simple. 
# With the **pblapply()**, I can't know which part generate the error, 
# whereas with the loop, testing one by one, I can find it, but it could be very long with more complex operation.
library(parallel)
library(pbapply)
dataset <- list(1,1,1,'1',1,1,1,1,1,1)
myfunction <- function(x){ 
print(x)
5 / dataset[[x]] 
}
cl <- makeCluster(2)
clusterExport(cl = cl, varlist = c('dataset', 'myfunction'), envir = environment())
result <- pblapply(
cl  = cl, 
X   = 1:length(dataset), 
FUN = function(i){ myfunction(i) }
)
stopCluster() 
# Error in checkForRemotErrors(vaL) : 
# one node produced errors: non-numeric argument to binary operator
for(i in 1:length(dataset)){ myfunction(i) }
# [1] 1
# [1] 2
# [1] 3
# [1] 4
# Error in 5/dataset[[x]] : non-numeric argument to binary operator

一种简单的方法是在可能导致错误的零件上使用tryCatch，例如：

myfunction <- function(x){ 
print(x)
tryCatch( 5 / dataset[[x]] , error=function(e) NULL)
}

这样，对于出现错误的情况，您可以得到NULL(或您选择的任何值(，并且可以在稍后的代码中处理该错误。

which(lengths(result)==0)

会告诉您哪些列表元素有错误。

然后，您可以检查到底发生了什么，并实现正确识别和处理(或防止(有问题的输入的代码。

相关内容

最新更新

热门标签：