我试图通过Yarn在Hadoop集群上运行spark shell。我使用
- Hadoop 2.4.1
- 火花1.0.0
我的Hadoop集群已经工作了。为了使用Spark,我按照下面的描述构建了Spark:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -DskipTests clean package
编译工作正常,我可以毫无困难地运行spark-shell
。但是,在yarn上运行它:
spark-shell --master yarn-client
得到以下错误:
14/07/07 11:30:32 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1404725422955
yarnAppState: ACCEPTED
14/07/07 11:30:33 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1404725422955
yarnAppState: FAILED
org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master
.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:105
)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:82)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:136)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:318)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
at $iwC$$iwC.<init>(<console>:8)
at $iwC.<init>(<console>:14)
at <init>(<console>:16)
at .<init>(<console>:20)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:913)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:930)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark设法与我的集群通信,但它不工作。另一个有趣的事情是,我可以使用pyspark --master yarn
访问我的集群。但是,我得到以下警告
14/07/07 14:10:11 WARN cluster.YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
和无限的计算时间,当做像
这样简单的事情时sc.wholeTextFiles('hdfs://vm7x64.fr/').collect()
是什么导致了这个问题
请检查Hadoop集群是否正常运行。在主节点上,下一个YARN进程必须运行:
$ jps
24970 ResourceManager
从节点/执行器:
$ jps
14389 NodeManager
还要确保你在Spark配置目录下创建了一个Hadoop配置的引用(或复制了这些文件):
$ ll /spark/conf/ | grep site
lrwxrwxrwx 1 hadoop hadoop 33 Jun 8 18:13 core-site.xml -> /hadoop/etc/hadoop/core-site.xml
lrwxrwxrwx 1 hadoop hadoop 33 Jun 8 18:13 hdfs-site.xml -> /hadoop/etc/hadoop/hdfs-site.xml
您也可以在8088 - http://master:8088/cluster/nodes端口上查看ResourceManager的Web UI。必须有一个可用节点和资源的列表。
您必须使用下一个命令(您可以在Web UI中找到的应用程序ID)查看您的日志文件:
$ yarn logs -applicationId <yourApplicationId>
或者你可以直接查看Master/ResourceManager主机上的整个日志文件:
$ ll /hadoop/logs/ | grep resourcemanager
-rw-rw-r-- 1 hadoop hadoop 368414 Jun 12 18:12 yarn-hadoop-resourcemanager-master.log
-rw-rw-r-- 1 hadoop hadoop 2632 Jun 12 17:52 yarn-hadoop-resourcemanager-master.out
在Slave/NodeManager主机上:
$ ll /hadoop/logs/ | grep nodemanager
-rw-rw-r-- 1 hadoop hadoop 284134 Jun 12 18:12 yarn-hadoop-nodemanager-slave.log
-rw-rw-r-- 1 hadoop hadoop 702 Jun 9 14:47 yarn-hadoop-nodemanager-slave.out
还要检查所有环境变量是否正确:
HADOOP_CONF_LIB_NATIVE_DIR=/hadoop/lib/native
HADOOP_MAPRED_HOME=/hadoop
HADOOP_COMMON_HOME=/hadoop
HADOOP_HDFS_HOME=/hadoop
YARN_HOME=/hadoop
HADOOP_INSTALL=/hadoop
HADOOP_CONF_DIR=/hadoop/etc/hadoop
YARN_CONF_DIR=/hadoop/etc/hadoop
SPARK_HOME=/spark