在运行时(通过 spark-submit)向 Spark 应用程序添加一些 hadoop 配置

我想向我的 Spark 应用程序发送一个键值对，如下所示：

mapreduce.input.fileinputformat.input.dir.recursive=true

我知道我可以通过以下方式从代码中做到这一点：

sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

但我希望能够在运行时通过火花提交发送此属性。这可能吗？

当然！

Spark-submit(以及Spark-shell(支持--conf PROP=VALUE和--properties-file FILE选项，允许您指定此类任意配置选项。然后，您可以使用 SparkConf .get 函数获取传递的值：

val conf = new SparkConf()

val mrRecursive =

conf.get("spark.mapreduce.input.fileinputformat.input.dir.recursive")
sc.hadoopConfiguration.set("spark.mapreduce.input.fileinputformat.input.dir.recursive", mrRecursive)

Spark-submit/spark-shell --help ：

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

关于[动态]加载属性的Spark文档：https://spark.apache.org/docs/latest/configuration.html

无需修改代码，即可使用此方法。

Hadoop配置在创建过程中读取文件"core-default.xml"，描述如下：https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html

如果将值放在"core-default.xml"中，并使用 spark-submit "driver-class-path" 参数在类路径中包含带有文件的目录，它可以工作。

相关内容

最新更新

热门标签：