线程"main" org.apache.spark.SparkException 中的异常:必须指定驱动程序容器映像



>我正在尝试使用命令从本地机器 CLI 在 minikube(Kubernetes( 上进行 spark-submit

spark-submit --master k8s://https://127.0.0.1:8001 --name cfe2 
--deploy-mode cluster --class com.yyy.Test --conf spark.executor.instances=2 --conf spark.kubernetes.container.image docker.io/anantpukale/spark_app:1.1 local://spark-0.0.1-SNAPSHOT.jar

我有一个简单的火花工作罐子,基于 verison 2.3.0 构建。我还在 docker 和 minikube 中将其容器化,并在虚拟盒子上运行。下面是异常堆栈:

Exception in thread "main" org.apache.spark.SparkException: Must specify the driver container image at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at scala.Option.getOrElse(Option.scala:121)  at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep.<init>(BasicDriverConfigurationStep.scala:51)
        at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:82)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2018-04-06 13:33:52 INFO  ShutdownHookManager:54 - Shutdown hook called 2018-04-06 13:33:52 INFO  ShutdownHookManager:54 - Deleting directory C:UsersanantAppDataLocalTempspark-6da93408-88cb-4fc7-a2de-18ed166c3c66

看起来像参数默认值的错误 spark.kubernetes.driver.container.image ,必须spark.kubernetes.container.image 。因此,请尝试直接指定驱动程序/执行程序容器映像:

  • spark.kubernetes.driver.container.image
  • spark.kubernetes.executor.container.image

从源代码来看,唯一可用的 conf 选项是:

spark.kubernetes.container.image
spark.kubernetes.driver.container.image
spark.kubernetes.executor.container.image
我注意到,与 2.2.0

相比,Spark 2.3.0 在 k8s 实现方面发生了很大变化。例如,官方入门指南不是单独指定驱动程序和执行器,而是使用提供给spark.kubernetes.container.image的单个图像。

看看这是否有效:

spark-submit 
--master k8s://http://127.0.0.1:8001 
--name cfe2 
--deploy-mode cluster 
--class com.oracle.Test 
--conf spark.executor.instances=2 
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 
--conf spark.kubernetes.authenticate.submission.oauthToken=YOUR_TOKEN 
--conf spark.kubernetes.authenticate.submission.caCertFile=PATH_TO_YOUR_CERT 
local://spark-0.0.1-SNAPSHOT.jar

令牌和证书可以在 k8s 仪表板上找到。按照说明制作与 Spark 2.3.0 兼容的 docker 映像。

最新更新