r-sparklyr+rsparkling:连接到群集时出错



一段时间以来,我使用sparklyr包连接到公司的Hadoop集群,使用代码:

library(sparklyr)
Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")
system('kinit -k -t user.keytab user@xyz')
sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20, 
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))

一切都很好,但当我试图使用类似的代码添加rsparkling包时:

library(h2o)
library(rsparkling)
library(sparklyr)
options(rsparkling.sparklingwater.version = '2.0')
Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")
system('kinit -k -t user.keytab user@xyz')
sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20, 
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))

我收到错误:

生效错误(代码):
将sparklyr连接到sessionid(9819)的端口(8880)时失败:sparklyr网关没有响应在60秒后检索端口信息时路径:/opt/spark-2.0.0-bin-hadoop2.6/bin/park-submit参数:--class,sparklyr。后端,--包,‘ai.h2o:喷洒水核心_2.11:2.0',‘ai.h2o:喷洒水ml_2.11:2.0’,‘ai-h2o:喷洒-水回复_2.11:2.0m’,'/usr/lib64/R/library/sparklyr/java/sparklyl-2.0-2.11.jar',8880,9819

----输出日志----
Ivy默认缓存设置为:/opt/users/user/.ivy2/Cache存储在中的包的jar:/opt/users/user/.ivy2/jars::加载设置::url=jar:file:/opt/spark-2.0.0-bin-hadoop2.6/jars/ivy-2.4.0.jar/org/apache/iiv/core/settings/ivysettings.xmlai.h2o#sparkling-water-core_2.11作为依赖项添加ai.h2o#sparkling-water ml_2.11作为依赖项添加ai.h2o#sparkling-water-repl_2.11作为依赖项添加::resolution依赖项::org.apache.spark#spark-submit parent;1confs:[默认]

----错误日志----
此外:警告消息:1:In if(nchar(config[[e]])==0)found<-FALSE:条件有长度1,并且仅使用第一个元素2:In if(nchar(config[[e]])==0)found<-FALSE:条件有长度1,并且只有第一个元素将被使用

我是sparkclusters的新手,不知道现在该怎么办。任何帮助都将不胜感激。我的第一个想法是cluster侧的sparkling water缺少jar文件,对吗?

您需要使用准确的Sparkling Water版本号:

options(rsparkling.sparklingwater.version = '2.0.5')

或者您可以直接从http://h2o.ai/download,解压缩并将上面的语句替换为:

options(rsparkling.sparklingwater.location = "/tmp/sparkling-water-assembly_2.11-2.0.99999-SNAPSHOT-all.jar")

最新更新