如何通过livy Programmatic API提交批处理jar Spark作业



我想使用livy Programmatic API提交批jar Spark作业,就像使用rest API批一样,我有json数据

{
"className": "org.apache.spark.examples.SparkPi",
"queue": "default",
"name": "SparkPi by Livy",
"proxyUser": "hadoop",
"executorMemory": "5g",
"args": [2000],
"file": "hdfs://host:port/resources/spark-examples_2.11-2.1.1.jar"
}

但我找不到任何关于这件事的文件,这可能吗?怎样

是的,您可以使用Livy通过rest API提交火花作业。请按照以下步骤操作,

  • 首先构建spark应用程序,创建汇编jar并将应用程序jar上传到hadoop集群的集群存储(HDFS(上
  • 使用curl提交作业(用于测试(,并使用http客户端api实现

在scala 中使用http客户端提交spark作业的示例代码

import org.apache.http.client.methods.{CloseableHttpResponse, HttpGet, 
HttpPost, HttpPut}
import org.apache.http.entity.StringEntity
import org.apache.http.impl.client.{CloseableHttpClient, HttpClientBuilder}
import org.apache.http.util.EntityUtils
import scala.util.parsing.json.{JSON, JSONObject}
def submitJob(className: String, jarPath:String, extraArgs: List[String]) : JSONObject = {
val jobSubmitRequest = new HttpPost(s"${clusterConfig.livyserver}/batches")
val data =  Map(
"className"-> className,
"file" -> jarPath,
"driverMemory" -> "2g",
"name" -> "LivyTest",
"proxyUser" -> "hadoop")
if(extraArgs != null && !extraArgs.isEmpty) {
data  + ( "args" -> extraArgs)
}
val json = new JSONObject(data)
println(json.toString())
val params = new StringEntity(json.toString(),"UTF-8")
params.setContentType("application/json")
jobSubmitRequest.addHeader("Content-Type", "application/json")
jobSubmitRequest.addHeader("Accept", "*/*")
jobSubmitRequest.setEntity(params)
val client: CloseableHttpClient = HttpClientBuilder.create().build()
val response: CloseableHttpResponse = client.execute(jobSubmitRequest)
HttpReqUtil.parseHttpResponse(response)._2
}

请参阅帖子了解更多详细信息https://www.linkedin.com/pulse/submitting-spark-jobs-remote-cluster-via-livy-rest-api-ramasamy/

以下链接中的示例项目https://github.com/ravikramesh/spark-rest-service

相关内容

最新更新