几个星期以来,我一直在尝试spark-submit到我的cloudera集群。我真的希望有人知道这是怎么回事。
我创建了一个脚本,它调用spark-submit和所有必需的参数。屏幕输出以下行
Using properties file: null
Using properties file: null
Parsed arguments:
master yarn
deployMode cluster
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile null
driverMemory null
driverCores null
driverExtraClassPath /home/bruce/workspace1/spark-cloudera/yarn/stable/target/spark-yarn_2.10-1.0.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.3.0-cdh5.1.0/hadoop-yarn-client-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0-cdh5.1.0/hadoop-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.3.0-cdh5.1.0/hadoop-yarn-api-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.3.0-cdh5.1.0/hadoop-yarn-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-auth/2.3.0-cdh5.1.0/hadoop-auth-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass org.apache.spark.examples.SparkPi
primaryResource file:/home/bruce/workspace1/spark-cloudera/examples/target/scala-2.10/spark-examples-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar
name org.apache.spark.examples.SparkPi
childArgs [10]
jars null
verbose true
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
调用被卡住很长时间,然后退出连接被拒绝。
我不明白的是参数指定使用YarnClient,但没有在哪里表明它知道如何联系纱线资源管理器,而不是ip,而不是端口。提交是在我的笔记本电脑上完成的,集群在邻近的子网上。spark-submit如何找出如何联系yarn服务?
来自Spark文档
确保HADOOP_CONF_DIR或YARN_CONF_DIR指向该目录其中包含Hadoop的(客户端)配置文件集群。这些配置用于写入dfs并连接到纱ResourceManager .