调用serialize R函数出错



我正在将以下包加载到R中:

library(foreach)
library(doParallel)
library(iterators)

我"并行"代码很长一段时间,但最近我得到间歇性停止,而代码正在运行。错误是:

Error in serialize(data, node$con) : error writing to connection

我有根据的猜测是,也许我使用下面的命令打开的连接已经过期:

## Register Cluster
##
cores<-8
cl <- makeCluster(cores)
registerDoParallel(cl)

查看makeCluster手册页,我看到默认情况下连接仅在30天后过期!我可以设置选项(错误=恢复),以便检查,在飞行中,如果连接是打开或不当代码停止,但我决定张贴这个一般问题之前。

重要:

1)错误确实是间歇性的,有时我重新运行相同的代码并没有得到错误。2)我在同一台多核机器上运行所有东西(Intel/8核)。因此,这不是集群之间的通信(网络)问题。3)我是CPU和GPU并行化的重度用户,在我的笔记本电脑和台式机(64核)上不幸的是,这是我第一次得到这种类型的错误。

是否有人有相同类型的错误?

根据要求,我提供了我的sessionInfo():

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] TTR_0.22-0       xts_0.9-3        doParallel_1.0.1 iterators_1.0.6  foreach_1.4.0    zoo_1.7-9        Revobase_6.2.0   RevoMods_6.2.0  
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 grid_2.15.3     lattice_0.20-13 tools_2.15.3   

@SeteveWeston,在其中一个调用的错误下面(同样是间歇性的):

starting worker pid=8808 on localhost:10187 at 15:21:52.232
starting worker pid=5492 on localhost:10187 at 15:21:53.624
starting worker pid=8804 on localhost:10187 at 15:21:54.997
starting worker pid=8540 on localhost:10187 at 15:21:56.360
starting worker pid=6308 on localhost:10187 at 15:21:57.721
starting worker pid=8164 on localhost:10187 at 15:21:59.137
starting worker pid=8064 on localhost:10187 at 15:22:00.491
starting worker pid=8528 on localhost:10187 at 15:22:01.855
Error in unserialize(node$con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

添加更多信息。我设置了选项(error=recover),它提供了以下信息:

Error in serialize(data, node$con) : error writing to connection
Enter a frame number, or 0 to exit   
1: #51: parallelize(FUN = "ensemble.prism", arg = list(prism = iis.long, instances = oos.instances), vectorize.arg = c("prism", "instances"), cores = cores, .export 
2: parallelize.R#58: foreach.bind(idx = i) %dopar% pFUN(idx)
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
4: clusterCall(cl, workerInit, c.expr, exportenv, obj$packages)
5: sendCall(cl[[i]], fun, list(...))
6: postNode(con, "EXEC", list(fun = fun, args = args, return = return, tag = tag))
7: sendData(con, list(type = type, data = value, tag = tag))
8: sendData.SOCKnode(con, list(type = type, data = value, tag = tag))
9: serialize(data, node$con)
Selection: 9

我试着检查连接是否仍然可用,有:

Browse[1]> showConnections()
   description                class      mode  text     isopen   can read can write
3  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
8  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
9  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
10 "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
Browse[1]> 

由于连接是打开的,错误0意味着R版本(正如@SteveWeston指出的),我真的不知道这里发生了什么。

编辑1:

我的解决方法

就传递给函数的参数而言,代码是好的。因此,@MichaelFilosi提供的答案并没有带来太多的东西。无论如何,非常感谢您的回答!

我找不到电话到底出了什么问题,但是,至少,我可以解决这个问题。

技巧是将每个并行线程的函数调用的参数分成更小的块。

错误神奇地消失了。

让我知道如果同样的工作为您!

这很可能是由于内存耗尽(详情请参阅我的博客文章)。下面是一个如何导致此错误的示例:

> a <- matrix(1, ncol=10^4*2.1, nrow=10^4)
> cl <- makeCluster(8, type = "FORK")
> parSapply(cl, 1:8, function(x) {
+   b <- a + 1
+   mean(b)
+   })
Error in unserialize(node$con) : error reading from connection

我为这个问题挣扎了很长一段时间,并且能够通过使用.packages=c("ex1","ex2")将所有所需的包移动到foreach循环中的参数中来修复它。以前我只是在循环中使用require("ex1"),这似乎是我错误的根本原因。

总的来说,我只想确保您将所有可能的内容移到foreach参数中以避免这些类型的错误。

我得到了一个类似的错误unserialize(node$con)错误:从connection

读取错误

我发现这是一个缺失的参数在调用C函数通过.Call()也许它会有帮助!

我有同样的问题,我怀疑这是一个内存问题。我的代码很简单:

library(doParallel)
library(foreach)
cl <- makeCluster(2, outfile='LOG.TXT')
registerDoParallel(cl)
res <- foreach(x=1:10) %dopar% x

,我在LOG.TXT中得到以下错误信息:

starting worker pid=13384 on localhost:11776 at 18:25:29.873
starting worker pid=21668 on localhost:11776 at 18:25:30.266
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

程序可以正常工作,所以我暂时忽略它。但是,在日志文件中看到这些错误时,我总是感到不舒服。

我使用foreachdoSNOW后端有同样的错误。

超时后,我收到了与op相同的错误,但是在不使用foreach的情况下运行任务时,不会返回任何错误。

显然,任务管理器可以由于多种原因杀死进程,而不仅仅是缺乏内存。

在我的特殊情况下,问题似乎是核心温度。减少cpu核数并调用sys.sleep()使系统运行更凉爽,错误不再出现。

也许值得一试。

相关内容

  • 没有找到相关文章

最新更新