我有一个Spark作业,它只在真正必要的情况下初始化Spark上下文:
val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
val sc = new SparkContext(conf)
sc.parallelize(jobs).foreach(....)
} else {
//do nothing
}
如果部署模式是"客户端",它在Yarn上运行良好
spark-submit --master yarn --deploy-mode client
然后我将部署模式切换到"集群",在jobs.isEmpty
的情况下它开始崩溃
spark-submit --master yarn --deploy-mode cluster
以下是错误文本:
INFO纱线。客户端:的应用程序报告application_1509613523426_0017(状态:已接受)02年11月17日上午11:37:17
INFO纱线。客户端:的应用程序报告application_1509613523426_0017(状态:失败)17/11/02 11:37:17
INFO纱线。客户端:客户端令牌:N/A诊断:应用程序application_1509613523426_0017由于的AM容器而失败2次appattempt_1509613523426_0017_ 000002退出,exitCode:-1000 For更详细的输出,检查应用程序跟踪
页码:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then,单击每次尝试的日志链接。诊断:文件没有存在:hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zipjava.io.FileNotFoundException:文件不存在:hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip在org.apache.hoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)在org.apache.hoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)在org.apache.hadop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolve.java:81)在org.apache.hdop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)网址:org.apache.hadop.yarn.util.FSDownload.copy(FSDownload.java:253)在org.apache.hadop.yarn.util.FSDownload.access$000(FSDownload.java:63)网址:org.apache.hadop.yarn.util.FSDownload$2.run(FSDownload.java:361)网址:org.apache.hadop.yarn.util.FSDownload$2.run(FSDownload.java:359)位于java.security.AccessController.doPrivileged(本机方法)javax.security.auth.Subject.doAs(Subject.java:422)org.apache.hoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)网址:org.apache.hadop.yarn.util.FSDownload.call(FSDownload.java:358)网址:org.apache.hadop.yarn.util.FSDownload.call(FSDownload.java:62)位于java.util.concurrent.FFutureTask.run(FutureTask.java:266)java.util.concurrent.Executors$RunnableAdapter.call(Executitors.java:511)位于java.util.concurrent.FFutureTask.run(FutureTask.java:266)java.util.concurrent.ThreadPoolExecutiator.runWorker(ThreadPoolExecutiator.java:1142)在java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)在java.lang.Thread.run(Thread.java:748)
尝试失败。应用程序失败。ApplicationMaster主机:N/A ApplicationMaster RPC端口:-1队列:dev启动时间:1509622629354最终状态:跟踪失败URL:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017user:xxx线程"main"中出现异常org.apache.spark.SparkException:应用程序application_1509613523426_0017已完成,但状态为失败org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)org.apache.spark.deploy.yarn.Client.main(Client.scala)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:498)org.apache.spark.deploy.SparkSubmit$org.apache$spark$deploy$SparkSubmit$$runMain(SparkSubmitte.scala:755)在org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)网址:org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmitte.scala:205)网址:org.apache.spark.deploy.SparkSubmit$.main(SparkSubmitte.scala:119)网址:org.apache.spark.deploy.SparkSubmit.main(SparkSubmitte.scala)02年11月17日11:37:17信息实用程序。ShutdownHookManager:调用了Shutdownhook02年11月17日11:37:17信息实用程序。ShutdownHookManager:正在删除目录/tmp/spark-a5b20def-0218-4b0c-b9f8-fdf8a1802e95
是Yarn支持中的错误还是我遗漏了什么?
SparkContext
负责与集群管理器的通信。如果应用程序已提交到集群,但从未创建上下文,YARN将无法确定应用程序的状态——这就是您收到错误的原因。