我已经用这种方式初始化了一个火花会话:
spark_session = SparkSession.builder
.appName('LSC_PROJECT')
.getOrCreate()
然后我试着用这种方式阅读很多表格:
df = self.spark_session.read.
csv(path=WAV.PATH_FILES_WAV+'/*.txt', header=False, schema= data_structure, sep='t').
withColumn("Filename", reverse(split(input_file_name(), "/")).getItem(0) ).
withColumn("duration", col("End") - col("Start"))
问题是,当我在本地使用spark运行它时,这是可行的,但当我在集群上运行它时我得到了以下错误:
Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='LSCproject/Database/audio_and_txt_files/*.txt', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database/audio_and_txt_files/*.txt;'
非常感谢任何指导或建议!
更新:
输出uning/user/user24/LSProject/Database/而不是WAV.PATH_FILES_WAV+'/.txt*
Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='/user/user24/LSCproject/Database/', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database;'
异常消息说HDFS路径不存在,添加正确的HDFS路径&再试一次。
Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database
Traceback (most recent call last):
File "/home/user24/LSCproject/Main.py", line 42, in <module>
wav.recording_annotation()
File "/home/user24/LSCproject/wav_manipulation/wav.py", line 45, in recording_annotation
csv(path='/user/user24/LSCproject/Database/', header=False, schema= data_structure, sep='t').
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/home/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://master:9000/user/user24/LSCproject/Database;'