当我尝试使用Pyspark从Amazon Keyspaces获取数据时，我会得到不支持的分区器:com.amazonaw

我对Java或Hadoop生态系统没有经验。我使用Datastax的Spark-cassandra连接器将我的Spark集群配置为连接到Amazon Keyspaces。我正在使用Pyspark从Cassandra获取数据。我可以成功连接到Keyspaces/Cassandra集群。但是，当我试图从中提取数据时。

df = spark.sql("SELECT * FROM cass.tutorialkeyspace.tutorialtable")
print ("Table Row Count: ")
print (df.count())

我得到这个错误：

Unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner

是的，keyspace&表存在，并且有数据。如何修复/解决此问题？谢谢

作为一个参考，Keyspaces现在支持使用RandomPartitioner，它可以通过使用开源的Spark Cassandra连接器在Apache Spark中读取和写入数据。

文档：https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html

发布公告：https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-keyspaces-read-write-data-apache-spark/

Spark Cassandra Connector依赖于特定的partitioner实现来定义数据拆分等。目前还没有解决这个问题的方法，直到有人将相应的TokenFactory实现添加到这个代码中。它不应该很复杂，只应该由感兴趣的人来完成

感谢您的反馈。此时，您可以使用Cassandra Spark连接器写入Keyspaces。阅读需要支持象征性的愤怒。请参阅以下文档页面以查看支持的API列表https://docs.aws.amazon.com/keyspaces/latest/devguide/cassandra-apis.html.

尽管我们目前没有时间表可供分享，但我们会根据客户反馈确定路线图的优先级。我们一直在发布新功能。要了解更多关于我们的路线图和即将推出的功能，请联系您的AWS客户经理。

相关内容

最新更新

热门标签：