Spark 提交无法从 jar 中选取类路径



我创建了一个 Spark 作业,它将从一个 Cassandra 表中获取数据并插入到另一个表中,我正在使用 Gradle 构建 jar 文件,我如何能够创建一个具有所有依赖项的 jar,我正在使用以下命令触发 Spark 作业

spark-submit --class DataMigration OrderAnalytics.jar

所有必需的jar都存在于OrderAnalytics中.jar即lib/**仍然我得到NoClassDefFoundError

如下
17/08/19 22:56:38 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)

META-INF看起来像这样

Manifest-Version: 1.0
Main-Class: DataMigration
Class-Path: lib/spark-sql_2.11-2.2.0.jar lib/spark-cassandra-connector
_2.11-2.0.3.jar lib/univocity-parsers-2.2.1.jar lib/spark-sketch_2.11
-2.2.0.jar lib/spark-core_2.11-2.2.0.jar lib/spark-catalyst_2.11-2.2.
0.jar lib/spark-tags_2.11-2.2.0.jar lib/parquet-column-1.8.2.jar lib/
parquet-hadoop-1.8.2.jar lib/jackson-databind-2.6.5.jar lib/xbean-asm
5-shaded-4.4.jar lib/unused-1.0.0.jar lib/jsr166e-1.1.0.jar lib/commo
ns-beanutils-1.9.3.jar lib/joda-time-2.3.jar lib/joda-convert-1.2.jar
lib/scala-reflect-2.11.8.jar lib/avro-1.7.7.jar lib/avro-mapred-1.7.
7-hadoop2.jar lib/chill_2.11-0.8.0.jar lib/chill-java-0.8.0.jar lib/h
adoop-client-2.6.5.jar lib/spark-launcher_2.11-2.2.0.jar lib/spark-ne
twork-common_2.11-2.2.0.jar lib/spark-network-shuffle_2.11-2.2.0.jar 
lib/spark-unsafe_2.11-2.2.0.jar lib/jets3t-0.9.3.jar lib/curator-reci
pes-2.6.0.jar lib/javax.servlet-api-3.1.0.jar lib/commons-lang3-3.5.j
ar lib/commons-math3-3.4.1.jar lib/jsr305-1.3.9.jar lib/jul-to-slf4j-
1.7.16.jar lib/jcl-over-slf4j-1.7.16.jar lib/log4j-1.2.17.jar lib/slf
4j-log4j12-1.7.16.jar lib/compress-lzf-1.0.3.jar lib/snappy-java-1.1.
2.6.jar lib/lz4-1.3.0.jar lib/RoaringBitmap-0.5.11.jar lib/json4s-jac
kson_2.11-3.2.11.jar lib/jersey-client-2.22.2.jar lib/jersey-common-2
.22.2.jar lib/jersey-server-2.22.2.jar lib/jersey-container-servlet-2
.22.2.jar lib/jersey-container-servlet-core-2.22.2.jar lib/netty-3.9.
9.Final.jar lib/stream-2.7.0.jar lib/metrics-core-3.1.2.jar lib/metri
cs-jvm-3.1.2.jar lib/metrics-json-3.1.2.jar lib/metrics-graphite-3.1.
2.jar lib/jackson-module-scala_2.11-2.6.5.jar lib/ivy-2.4.0.jar lib/o
ro-2.0.8.jar lib/pyrolite-4.13.jar lib/py4j-0.10.4.jar lib/commons-cr
ypto-1.0.0.jar lib/janino-3.0.0.jar lib/commons-compiler-3.0.0.jar li
b/antlr4-runtime-4.5.3.jar lib/commons-codec-1.10.jar lib/parquet-com
mon-1.8.2.jar lib/parquet-encoding-1.8.2.jar lib/parquet-format-2.3.1
.jar lib/parquet-jackson-1.8.2.jar lib/jackson-core-2.6.5.jar lib/com
mons-collections-3.2.2.jar lib/commons-compress-1.4.1.jar lib/avro-ip
c-1.7.7.jar lib/avro-ipc-1.7.7-tests.jar lib/kryo-shaded-3.0.3.jar li
b/hadoop-common-2.6.5.jar lib/hadoop-hdfs-2.6.5.jar lib/hadoop-mapred
uce-client-app-2.6.5.jar lib/hadoop-yarn-api-2.6.5.jar lib/hadoop-map
reduce-client-core-2.6.5.jar lib/hadoop-mapreduce-client-jobclient-2.
6.5.jar lib/hadoop-annotations-2.6.5.jar lib/leveldbjni-all-1.8.jar l
ib/httpcore-4.3.3.jar lib/httpclient-4.3.6.jar lib/activation-1.1.1.j
ar lib/mx4j-3.0.2.jar lib/mail-1.4.7.jar lib/bcprov-jdk15on-1.51.jar 
lib/java-xmlbuilder-1.0.jar lib/curator-framework-2.6.0.jar lib/zooke
eper-3.4.6.jar lib/guava-16.0.1.jar lib/json4s-core_2.11-3.2.11.jar l
ib/javax.ws.rs-api-2.0.1.jar lib/hk2-api-2.4.0-b34.jar lib/javax.inje
ct-2.4.0-b34.jar lib/hk2-locator-2.4.0-b34.jar lib/javax.annotation-a
pi-1.2.jar lib/jersey-guava-2.22.2.jar lib/osgi-resource-locator-1.0.
1.jar lib/jersey-media-jaxb-2.22.2.jar lib/validation-api-1.1.0.Final
.jar lib/jackson-module-paranamer-2.6.5.jar lib/xz-1.0.jar lib/minlog
-1.3.0.jar lib/objenesis-2.1.jar lib/commons-cli-1.2.jar lib/xmlenc-0
.52.jar lib/commons-httpclient-3.1.jar lib/commons-io-2.4.jar lib/com
mons-lang-2.6.jar lib/commons-configuration-1.6.jar lib/protobuf-java
-2.5.0.jar lib/gson-2.2.4.jar lib/hadoop-auth-2.6.5.jar lib/curator-c
lient-2.6.0.jar lib/htrace-core-3.0.4.jar lib/jetty-util-6.1.26.jar l
ib/xercesImpl-2.9.1.jar lib/hadoop-mapreduce-client-common-2.6.5.jar 
lib/hadoop-mapreduce-client-shuffle-2.6.5.jar lib/hadoop-yarn-common-
2.6.5.jar lib/base64-2.3.8.jar lib/json4s-ast_2.11-3.2.11.jar lib/sca
lap-2.11.0.jar lib/hk2-utils-2.4.0-b34.jar lib/aopalliance-repackaged
-2.4.0-b34.jar lib/javassist-3.18.1-GA.jar lib/commons-digester-1.8.j
ar lib/commons-beanutils-core-1.8.0.jar lib/apacheds-kerberos-codec-2
.0.0-M15.jar lib/xml-apis-1.3.04.jar lib/hadoop-yarn-client-2.6.5.jar
lib/hadoop-yarn-server-common-2.6.5.jar lib/hadoop-yarn-server-nodem
anager-2.6.5.jar lib/jaxb-api-2.2.2.jar lib/jackson-jaxrs-1.9.13.jar 
lib/jackson-xc-1.9.13.jar lib/guice-3.0.jar lib/scala-compiler-2.11.0
.jar lib/javax.inject-1.jar lib/jline-0.9.94.jar lib/apacheds-i18n-2.
0.0-M15.jar lib/api-asn1-api-1.0.0-M20.jar lib/api-util-1.0.0-M20.jar
lib/jettison-1.1.jar lib/stax-api-1.0-2.jar lib/aopalliance-1.0.jar 
lib/cglib-2.2.1-v20090111.jar lib/scala-xml_2.11-1.0.1.jar lib/scala-
parser-combinators_2.11-1.0.1.jar lib/scala-library-2.11.8.jar lib/sl
f4j-api-1.7.16.jar lib/netty-all-4.0.43.Final.jar lib/jackson-core-as
l-1.9.13.jar lib/jackson-mapper-asl-1.9.13.jar lib/jackson-annotation
s-2.6.5.jar lib/commons-net-3.1.jar lib/paranamer-2.6.jar

更新
由于艾莉森·伯曼(Allison Berman(的评论和回答很少有人建议我尝试过以下方法

C:Dev-TraOrderAnalyticsbuildlibs>spark-submit --jars OrderAnalytics.jar  --class example.DataMigration
Error: Cannot load main class from JAR file:/C:/
Run with --help for usage help or --verbose for debug output
C:Dev-TraOrderAnalyticsbuildlibs>spark-submit --jars OrderAnalytics.jar --class example.DataMigration
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
at org.apache.spark.launcher.Main.main(Main.java:86)

但根据 Spark 文档,它应该如下所示,它能够启动作业但无法获取所有依赖的 jar

C:Dev-TraOrderAnalyticsbuildlibs>spark-submit  --class example.DataMigration OrderAnalytics.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/08/21 21:59:36 INFO SparkContext: Running Spark version 2.2.0
17/08/21 21:59:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/21 21:59:36 INFO SparkContext: Submitted application: DataMigration
17/08/21 21:59:36 INFO SecurityManager: Changing view acls to: ram
17/08/21 21:59:36 INFO SecurityManager: Changing modify acls to: ram
17/08/21 21:59:36 INFO SecurityManager: Changing view acls groups to:
17/08/21 21:59:36 INFO SecurityManager: Changing modify acls groups to:
17/08/21 21:59:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ram); groups with vi
ew permissions: Set(); users  with modify permissions: Set(ram); groups with modify permissions: Set()
17/08/21 21:59:37 INFO Utils: Successfully started service 'sparkDriver' on port 62239.
17/08/21 21:59:37 INFO SparkEnv: Registering MapOutputTracker
17/08/21 21:59:37 INFO SparkEnv: Registering BlockManagerMaster
17/08/21 21:59:37 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/21 21:59:37 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/21 21:59:37 INFO DiskBlockManager: Created local directory at C:UsersramAppDataLocalTempblockmgr-38ef35e6-219e-450c-b7da-c8075464a232
17/08/21 21:59:37 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/21 21:59:37 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/21 21:59:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/21 21:59:37 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040
17/08/21 21:59:38 INFO SparkContext: Added JAR file:/C:/Dev-Tra/OrderAnalytics/build/libs/OrderAnalytics.jar at spark://192.168.1.101:62239/jars/OrderAnalytics
.jar with timestamp 1503332978023
17/08/21 21:59:38 INFO Executor: Starting executor ID driver on host localhost
17/08/21 21:59:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62248.
17/08/21 21:59:38 INFO NettyBlockTransferService: Server created on 192.168.1.101:62248
17/08/21 21:59:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/21 21:59:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:38 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:62248 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.101, 62248
, None)
17/08/21 21:59:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 62248, None)
17/08/21 21:59:40 INFO Native: Could not load JNR C Library, native system calls through this library will not be available (set this logger level to DEBUG to
see the full stack trace).
17/08/21 21:59:40 INFO ClockFactory: Using java.lang.System clock to generate timestamps.
17/08/21 21:59:41 WARN NettyUtil: Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
17/08/21 21:59:41 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
17/08/21 21:59:41 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
17/08/21 21:59:42 INFO SparkContext: Starting job: runJob at RDDFunctions.scala:36
17/08/21 21:59:42 INFO DAGScheduler: Got job 0 (runJob at RDDFunctions.scala:36) with 4 output partitions
17/08/21 21:59:42 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at RDDFunctions.scala:36)
17/08/21 21:59:42 INFO DAGScheduler: Parents of final stage: List()
17/08/21 21:59:42 INFO DAGScheduler: Missing parents: List()
17/08/21 21:59:42 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at DataMigration.scala:18), which has no missing parents
17/08/21 21:59:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 12.3 KB, free 366.3 MB)
17/08/21 21:59:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.8 KB, free 366.3 MB)
17/08/21 21:59:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:62248 (size: 5.8 KB, free: 366.3 MB)
17/08/21 21:59:42 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/08/21 21:59:42 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at DataMigration.scala:18) (first 15 tasks are f
or partitions Vector(0, 1, 2, 3))
17/08/21 21:59:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/08/21 21:59:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, NODE_LOCAL, 17002 bytes)
17/08/21 21:59:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/21 21:59:42 INFO Executor: Fetching spark://192.168.1.101:62239/jars/OrderAnalytics.jar with timestamp 1503332978023
17/08/21 21:59:42 INFO TransportClientFactory: Successfully created connection to /192.168.1.101:62239 after 18 ms (0 ms spent in bootstraps)
17/08/21 21:59:42 INFO Utils: Fetching spark://192.168.1.101:62239/jars/OrderAnalytics.jar to C:UsersramAppDataLocalTempspark-73cbbbe8-9e06-4a11-976
a-a766305d4148userFiles-3e4c9dea-6273-4d9e-a17b-c807aa0e3da5fetchFileTemp7196411614488839489.tmp
17/08/21 21:59:43 INFO Executor: Adding file:/C:/Users/ram/AppData/Local/Temp/spark-73cbbbe8-9e06-4a11-976a-a766305d4148/userFiles-3e4c9dea-6273-4d9e-a17b
-c807aa0e3da5/OrderAnalytics.jar to class loader
17/08/21 21:59:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:44 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, NODE_LOCAL, 15334 bytes)
17/08/21 21:59:44 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/21 21:59:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/Long
Adder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
17/08/21 21:59:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/08/21 21:59:44 INFO TaskSchedulerImpl: Cancelling stage 0
17/08/21 21:59:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/08/21 21:59:44 INFO TaskSchedulerImpl: Stage 0 was cancelled
17/08/21 21:59:44 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on localhost, executor driver: java.lang.NoClassDefFoundError (com/twitter/jsr166e/Lo
ngAdder) [duplicate 1]
17/08/21 21:59:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/08/21 21:59:44 INFO DAGScheduler: ResultStage 0 (runJob at RDDFunctions.scala:36) failed in 1.894 s due to Job aborted due to stage failure: Task 0 in stage
0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/L
ongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
Driver stacktrace:
17/08/21 21:59:44 INFO DAGScheduler: Job 0 failed: runJob at RDDFunctions.scala:36, took 2.152376 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost tas
k 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2075)
at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:36)
at example.DataMigration$.main(DataMigration.scala:20)
at example.DataMigration.main(DataMigration.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:152)
at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:174)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.twitter.jsr166e.LongAdder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/08/21 21:59:51 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
17/08/21 21:59:52 INFO SerialShutdownHooks: Successfully executed shutdown hook: Clearing session cache for C* connector
17/08/21 21:59:52 INFO SparkContext: Invoking stop() from shutdown hook
17/08/21 21:59:52 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
17/08/21 21:59:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/21 21:59:52 INFO MemoryStore: MemoryStore cleared
17/08/21 21:59:52 INFO BlockManager: BlockManager stopped
17/08/21 21:59:52 INFO BlockManagerMaster: BlockManagerMaster stopped
17/08/21 21:59:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/08/21 21:59:52 INFO SparkContext: Successfully stopped SparkContext
17/08/21 21:59:52 INFO ShutdownHookManager: Shutdown hook called
17/08/21 21:59:52 INFO ShutdownHookManager: Deleting directory C:UsersramAppDataLocalTempspark-73cbbbe8-9e06-4a11-976a-a766305d4148

任何人都可以告诉我为什么Spark无法选择jar的类路径或如何解决此问题?

谢谢

Indrajit是正确的,你需要包含包。当我将文件留在默认包中时,我遇到了类似的问题。 使文件夹结构与此 http://www.scala-sbt.org/0.13/docs/Directories.html 相同

在src/main/scala 或 src/main/java 中添加一个新文件夹YOUR_PACKAGE,并将 DataMigration 放入 YOUR_PACKAGE 中。确保数据迁移的第一行是:

package YOUR_PACKAGE

然后,您的火花提交将是:

spark-submit --jars OrderAnalytics.jar 
--class YOUR_PACKAGE.DataMigration 

最新更新