在cloudera 5上以YARN模式运行spark应用



几个星期以来,我一直在尝试spark-submit到我的cloudera集群。我真的希望有人知道这是怎么回事。

我创建了一个脚本,它调用spark-submit和所有必需的参数。屏幕输出以下行

Using properties file: null
Using properties file: null
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    /home/bruce/workspace1/spark-cloudera/yarn/stable/target/spark-yarn_2.10-1.0.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.3.0-cdh5.1.0/hadoop-yarn-client-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0-cdh5.1.0/hadoop-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.3.0-cdh5.1.0/hadoop-yarn-api-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.3.0-cdh5.1.0/hadoop-yarn-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-auth/2.3.0-cdh5.1.0/hadoop-auth-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         file:/home/bruce/workspace1/spark-cloudera/examples/target/scala-2.10/spark-examples-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar
  name                    org.apache.spark.examples.SparkPi
  childArgs               [10]
  jars                    null
  verbose                 true

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

调用被卡住很长时间,然后退出连接被拒绝。

我不明白的是参数指定使用YarnClient,但没有在哪里表明它知道如何联系纱线资源管理器,而不是ip,而不是端口。提交是在我的笔记本电脑上完成的,集群在邻近的子网上。spark-submit如何找出如何联系yarn服务?

来自Spark文档

确保HADOOP_CONF_DIR或YARN_CONF_DIR指向该目录其中包含Hadoop的(客户端)配置文件集群。这些配置用于写入dfs并连接到纱ResourceManager .

最新更新