Spark 2.0与(DataStax)Cassandra 2.1.13兼容?我已经在本地Mac上安装了Spark 2.1.0,还安装了Scala 2.11.x。我正在尝试从已安装DataStax 4.8.6的服务器上阅读到Cassandra表(Spark 1.4和Cassandra 2.1.13)
我在Spark Shell上运行的代码
spark-shell
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.implicits._
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql._
import org.apache.spark.sql
import org.apache.spark.SparkContext._
import com.datastax.spark.connector.cql.CassandraConnector._
spark.stop
val sparkSession = SparkSession.builder.appName("Spark app").config("spark.cassandra.connection.host",CassandraNodeList).config("spark.cassandra.auth.username", CassandraUser).config("spark.cassandra.auth.password", CassandraPassword).config("spark.cassandra.connection.port", "9042").getOrCreate()
sparkSession.sql("""CREATE TEMPORARY view hdfsfile
|USING org.apache.spark.sql.cassandra
|OPTIONS (
| table "hdfs_file",
| keyspace "keyspaceName")""".stripMargin)
*************************************
17/02/28 10:33:02错误执行者:阶段3.0中任务8.0中的异常(TID 20)java.lang.noclassdeffounderror:scala/collection/gentraversableonce $ class 在com.datastax.spark.connector.util.countingiterator。 在com.datastax.spark.connector.rdd.cassandratablescanrdd.compute(cassandratablescanrdd.scala:336) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:287) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:38) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:287) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:38) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:287) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:38) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:287) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:87) atrg.apache.spark.scheduler.task.run(task.scala:99) 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:282) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:617) 在java.lang.thread.run(thread.java:745)
这是Scala版本不匹配错误。您使用的是带有Scala 2.11(反之亦然)的Scala 2.10库。它在SCC常见问题中解释
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/faq/faq.md#what-what-does-this-this-mean-noclassdeffounderror-scalacollectiongentrectiongentrablegentrablyonclablablassablesabla
引用常见问题
这意味着使用的库中有Scala版本的混合 在您的代码中。Collection API在Scala 2.10和 2.11,这是如果Scala 2.10库被尝试加载到Scala 2.11运行时,这是最常见的错误。解决这个问题 确保库名称具有正确的Scala版本后缀可以匹配 您的Scala版本。