错误的FS s3://ss-pprd-v2-dart//tempdir/962c6007-77c0-4294-b021-



我使用的是spark 3.2.1,Java8->1.8.0_292(AdoptOpenJDK(,Scala 2.12.10,并尝试使用下面提到的jar和包从redshift读取和写入数据。但我无法将数据写回。将数据写回红移时。它在temp目录中用一个manifest.json文件创建avro文件,但在我当前的版本中,它无法创建manifest.json文件,但它正在创建所有的avro文件。

罐子和包装:-

RedshiftJDBC42-no-awssdk-1.2.54.1082.jar,
hadoop-aws-3.3.1.jar,aws-java-sdk-1.12.173.jar ,
org.apache.spark:spark-avro_2.12:3.2.1,
io.github.spark-redshift-community:spark-redshift_2.12:5.0.3,
com.eclipsesource.minimal-json:minimal-json:0.9.5

我正在尝试运行的代码:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
conf=SparkConf().setAppName("Testing")
sc=SparkContext.getOrCreate(conf)
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", AWS_ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET_KEY)
df.write 
.format("io.github.spark_redshift_community.spark.redshift")
.option("url", REDSHIFT_JDBC_URL) 
.option("dbtable",MASTER_TABLE) 
.option("forward_spark_s3_credentials", "true") 
.option("extracopyoptions", EXTRACOPYOPTIONS) 
.option("tempdir", "s3a://" + str(S3_BUCKET) + "/tempdir") 
.mode("append") 
.save()
print("Sucesss")

堆栈跟踪:

Traceback (most recent call last):
File "/Users/brajeshmishra/Documents/TEMP/Temp_Py.py", line 65, in <module>
.mode("append") 
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 738, in save
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/opt/homebrew/Cellar/apache-
List item
spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.IllegalArgumentException: Wrong FS s3://ss-pprd-v2-dart//tempdir/962c6007-77c0-4294-b021-b9498e3d66ab/manifest.json -expected s3a://ss-pprd-v2-dart

我得到了同样的错误,尝试这些版本,应该可以工作。

hadoopVersion = 3.2.1
sparkVersion = 3.1.2
"org.apache.hadoop" % "hadoop-common" % hadoopVersion,
"org.apache.hadoop" % "hadoop-aws" % hadoopVersion,
"org.apache.hadoop" % "hadoop-hdfs" % hadoopVersion,
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-avro" % sparkVersion,
"com.fasterxml.jackson.core"    % "jackson-databind"     % "2.12.2",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.12.2",
"io.github.spark-redshift-community" %% "spark-redshift" % "5.0.3",
"com.amazonaws" % "aws-java-sdk-core" % "1.12.286",
"com.amazonaws" % "aws-java-sdk-s3" % "1.12.286",
"com.eclipsesource.minimal-json" % "minimal-json" % "0.9.4"

最新更新