Spark流数据集Cassandra连接不支持OperationChecker



我正试图将流数据集写入Cassandra。

我有以下类的流数据集;

case class UserSession(var id: Int,
var visited: List[String]
)

我在Cassandra中还有以下密钥空间/表。(博客=KeySpace,会话=表格

CREATE KEYSPACE blog WITH REPLICATION = { 'class' : 'SimpleStrategy',    'replication_factor' : 1 };

CREATE TABLE blog.session(id int PRIMARY KEY, visited list<text>);

我选择list<text>作为已访问,因为我的已访问类型是List<String>

我的foreach作者如下

class SessionCassandraForeachWriter extends ForeachWriter[UserSession] {
/*
- on every batch, on every partition `partitionId`
- on every "epoch" = chunk of data
- call the open method; if false, skip this chunk
- for each entry in this chunk, call the process method
- call the close method either at the end of the chunk or with an error if it was thrown
*/
val keyspace = "blog"
val table = "session"
val connector = CassandraConnector(sparkSession.sparkContext.getConf)
override def open(partitionId: Long, epochId: Long): Boolean = {
println("Open connection")
true
}
override def process(sess: UserSession): Unit = {
connector.withSessionDo { session =>
session.execute(
s"""
|insert into $keyspace.$table("id")
|values (${sess.id},${sess.visited})
""".stripMargin)
}
}
override def close(errorOrNull: Throwable): Unit = println("Closing connection")
}

查看我的流程函数可能会有所帮助,因为这可能会引发错误。我的主要内容如下。

finishedUserSessionsStream:DataSet[UserSession]

def main(args: Array[String]): Unit = {
/// make finishedUserSessionStreams.....
finishedUserSessionsStream.writeStream
.option("checkpointLocation", "checkpoint")
.foreach(new SessionCassandraForeachWriter)
.start()
.awaitTermination()
}

这给了我以下错误

org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.throwError(UnsupportedOperationChecker.scala:431)

对于Spark 3.0&Spark Cassandra Connector 3.0.0您不应该使用foreach-这是SCC<2.5.0,它没有对编写流数据集的本地支持。从SCC 2.5.0开始,您可以直接将数据写入Cassandra,如下所示(这里是完整的示例(:

val query = streamingCountsDF.writeStream
.outputMode(OutputMode.Update)
.format("org.apache.spark.sql.cassandra")
.option("checkpointLocation", "checkpoint")
.option("keyspace", "ks")
.option("table", "table")
.start()

您还需要切换到使用SCC 3.0.0-beta,其中包含许多修复程序。

最新更新