在EMR中使用spark ad scala从红移加载数据



我正试图使用spark将红移与来自EMR集群的齐柏林飞船中的scala连接起来,我使用了spark红移库,但它不起作用。我尝试了很多解决方案,我不知道为什么它会给出错误


val df  = spark.read .format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://xx:xx/xxxx?user=xxx&password=xxx")
.option("tempdir", path)
.option("query", sql_query) .load() ```

``` java.lang.ClassNotFoundException: Failed to find data source:
com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 51 elided
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
... 53 more ```
Should I import something before ? or may be do some configuration

为了在EMR中运行特定模块,您必须将这些模块添加到集群中。(它们不会自动出现(

您的错误是说找不到模块。看看https://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/

最新更新