刚刚开始试验JobServer,并希望在我们的生产环境中使用它。
我们通常在yarn客户端模式下单独运行spark作业,并希望转向Ooyala spark JobServer提供的模式。
我能够运行官方页面中显示的WordCount示例。我尝试运行将我们的自定义spark作业提交到spark JobServer,结果出现了以下错误:
{
"status": "ERROR",
"result": {
"message": "null",
"errorClass": "scala.MatchError",
"stack": ["spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:220)",
"scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)",
"scala.concurrent.impl.Future $PromiseCompletingRunnable.run(Future.scala:24)",
"akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)",
"akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)",
"scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)",
"scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java 1339)",
"scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)",
"scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"]
}
我做了必要的代码修改,比如扩展SparkJob和实现runJob()方法。
这是我使用的dev.conf文件:
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
jar-store-rootdir = /tmp/jobserver/jars
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
context-creation-timeout = "60 s"
}
contexts {
my-low-latency-context {
num-cpu-cores = 1
memory-per-node = 512m
}
}
context-settings {
num-cpu-cores = 2
memory-per-node = 512m
}
home = "/data/softwares/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041"
}
spray.can.server {
parsing.max-content-length = 200m
}
spark.driver.allowMultipleContexts = true
YARN_CONF_DIR=/home/spark/conf/
此外,我如何为spark作业提供运行时参数,例如--files、--jar?例如,我通常这样运行我们的自定义火花作业:
./spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/bin/spark-submit --class com.demo.SparkDriver --master yarn-cluster --num-executors 3 --jars /tmp/api/myUtil.jar --files /tmp/myConfFile.conf,/tmp/mySchema.txt /tmp/mySparkJob.jar
通过配置文件以不同的方式传递执行器和额外jar的数量(请参阅依赖的jar uri配置设置)。
YARN_CONF_DIR应该在环境中设置,而不是在.CONF文件中设置。
至于其他问题,谷歌集团是一个合适的提问场所。你可能想在它中搜索纱线客户端问题,因为其他几个人已经找到了如何让它发挥作用。