Dataproc;Dataproc Spark集群上Spark作业失败,但在本地运行



我有一个通过Maven项目生成的JAR文件,当我通过java -jar JARFILENAME.jar在本地运行它时,它可以正常工作。但是,当我尝试在Dataproc上运行相同的JAR文件时,我得到以下错误:

22/06/27 13:13:45 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/06/27 13:13:49 INFO org.sparkproject.jetty.util.log: Logging initialized @7373ms to org.sparkproject.jetty.util.log.Slf4jLog
22/06/27 13:13:51 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest.getPercentiles([D)Lscala/collection/Seq;
at com.amazon.deequ.analyzers.ApproxQuantile.fromAggregationResult(ApproxQuantile.scala:84)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:192)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
at com.amazon.deequ.analyzers.ApproxQuantile.metricFromAggregationResult(ApproxQuantile.scala:50)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
at scala.collection.immutable.List.map(List.scala:297)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:121)
at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:173)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1(GCTestOne.scala:42)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1$adapted(GCTestOne.scala:11)
at com.amazon.deequ.examples.ExampleUtils$.withSpark(ExampleUtils.scala:32)
at com.amazon.deequ.thesis.GCTestOne$.main(GCTestOne.scala:11)
at com.amazon.deequ.thesis.GCTestOne.main(GCTestOne.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我不明白为什么当一切在本地运行良好时,Dataproc有一个NoSuchMethodError。

有人知道这是为什么吗?

与GCP版本不匹配。我有Spark 3.2.1,但是集群运行在3.1上。

最新更新