Pyspark:使用hdfs:\\master:读取集群中的表时出现问题



我已经用这种方式初始化了一个火花会话:

spark_session = SparkSession.builder 
.appName('LSC_PROJECT') 
.getOrCreate()

然后我试着用这种方式阅读很多表格:

df = self.spark_session.read.
csv(path=WAV.PATH_FILES_WAV+'/*.txt', header=False, schema= data_structure, sep='t').
withColumn("Filename", reverse(split(input_file_name(), "/")).getItem(0) ).
withColumn("duration", col("End") - col("Start"))

问题是,当我在本地使用spark运行它时,这是可行的,但当我在集群上运行它时我得到了以下错误:

Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='LSCproject/Database/audio_and_txt_files/*.txt', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database/audio_and_txt_files/*.txt;'

非常感谢任何指导或建议!

更新:

输出uning/user/user24/LSProject/Database/而不是WAV.PATH_FILES_WAV+'/.txt*

Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='/user/user24/LSCproject/Database/', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database;'

异常消息说HDFS路径不存在,添加正确的HDFS路径&再试一次。

Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database
Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='/user/user24/LSCproject/Database/', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database;'

最新更新