hudi delta拖缆作业通过apache livy



请帮助如何将——props文件和——source-class文件传递给LIVY API POST。

spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 
--master yarn 
--deploy-mode cluster 
--conf spark.sql.shuffle.partitions=100 
--driver-class-path $HADOOP_CONF_DIR 
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
--table-type MERGE_ON_READ 
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource 
--source-ordering-field tst  
--target-base-path /user/hive/warehouse/stock_ticks_mor 
--target-table test 
--props /var/demo/config/kafka-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--continuous

我已经将你在json文件中使用的配置转换为传递给LIVY API

{
"className": "org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer",
"proxyUser": "root",
"driverCores": 1,
"executorCores": 2,
"executorMemory": "1G",
"numExecutors": 4,
"queue": "default",
"name": "stock_ticks_mor",
"file": "hdfs://tmp/hudi-utilities-bundle_2.12-0.8.0.jar",
"conf": {
"spark.sql.shuffle.partitions": "100",
"spark.jars.packages": "org.apache.hudi:hudi-spark-bundle_2.12:0.8.0,org.apache.spark:spark-avro_2.12:3.0.2",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.task.cpus": "1",
"spark.executor.cores": "1"
},
"args": [
"--props","/var/demo/config/kafka-source.properties",
"--table-type","MERGE_ON_READ",
"--source-class", "org.apache.hudi.utilities.sources.JsonKafkaSource",
"--target-base-path","/user/hive/warehouse/stock_ticks_mor",
"--target-table","test",
"--schemaprovider-class","org.apache.hudi.utilities.schema.FilebasedSchemaProvider",
"--continuous"
]
}

你可以像

一样将这个json提交给LIVY端点
curl -H "X-Requested-By: admin" -H "Content-Type: application/json" -X POST -d @config.json http://localhost:8999/batches

参考:https://community.cloudera.com/t5/Community-Articles/How-to-Submit-Spark-Application-through-Livy-REST-API/ta-p/247502

最新更新