如何在pyspark中读取时间戳csv文件



我有一个带有时间戳的csv文件。我必须使用pyspark读取文件。但时间戳我们不知道。请帮我怎么读?

示例:

filename - projectno_without_data_20211030.csv

我必须在不知道时间戳的情况下以这种格式阅读-projectno_without_data_*.csv

我正在使用以下代码-

df_read_file = sqlContext.read.format('com.databricks.spark.csv').option("delimiter", '|').options(header='true',quote='', escape='"', inferSchema='false').load('/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv')

错误-

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/readwriter.py", line 178, in load
return self._df(self._jreader.load(path))
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Path does not exist: file:/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv;
df_read_file = spark.read.format("com.databricks.spark.csv")
.option("delimiter", '|').options(header="true")
.load("/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*")

你能试试这个吗?

最新更新