我正在尝试运行Target创建的名为数据验证器的数据验证框架,以验证Azure数据砖中拼花文件中的数据。
我已经创建了一个使用数据验证器fat-jar文件的spark作业。
如果我给出一个参数--help,我可以得到关于如何使用数据验证器的帮助,但当我传递--config test_config.yaml文件时,数据验证器找不到该文件。在此处输入图像描述
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Warning: Ignoring non-Spark config property: libraryDownload.sleepIntervalSeconds
Warning: Ignoring non-Spark config property: libraryDownload.timeoutSeconds
Warning: Ignoring non-Spark config property: eventLog.rolloverIntervalSeconds
21/12/30 06:17:29 INFO Main$: Logging configured!
21/12/30 06:17:29 INFO Main$: Data Validator
21/12/30 06:17:30 INFO ConfigParser$: Parsing `dbfs:/FileStore/shared_uploads/jyoti/test_config.yaml`
21/12/30 06:17:30 INFO ConfigParser$: Attempting to load `dbfs:/FileStore/shared_uploads/jyoti/test_config.yaml` from file system
21/12/30 06:17:30 ERROR Main$: Failed to parse config file 'dbfs:/FileStore/shared_uploads/jyoti/test_config.yaml, {}
DecodingFailure(java.io.FileNotFoundException: dbfs:/FileStore/shared_uploads/jyoti/test_config.yaml (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.io.Source$.fromFile(Source.scala:94)
at scala.io.Source$.fromFile(Source.scala:79)
at scala.io.Source$.fromFile(Source.scala:57)
at com.target.data_validator.ConfigParser$.com$target$data_validator$ConfigParser$$loadFromFile(ConfigParser.scala:39)
at com.target.data_validator.ConfigParser$$anonfun$6.apply(ConfigParser.scala:57)
at com.target.data_validator.ConfigParser$$anonfun$6.apply(ConfigParser.scala:54)
at scala.util.Try$.apply(Try.scala:213)
at com.target.data_validator.ConfigParser$.parseFile(ConfigParser.scala:53)
at com.target.data_validator.Main$.loadConfigRun(Main.scala:23)
at com.target.data_validator.Main$.main(Main.scala:171)
at com.target.data_validator.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
, List())
21/12/30 06:17:30 ERROR Main$: data-validator failed!
我已将yaml文件存储在dbfs中。
请让我知道如何使用databricks中的spark作业在数据验证器中传递YAML配置文件。
您需要将相应格式化的文件名传递给DBFS本地文件API,因为ConfigParser
库很可能只适用于本地文件。要做到这一点,您需要将dbfs:
替换为/dbfs
,就像您的示例中一样——将dbfs:/FileStore/shared_uploads/jyoti/test_config.yaml
更改为/dbfs/FileStore/shared_uploads/jyoti/test_config.yaml
。