如何在 Spark 中将 CSV 读取到数据帧时指定多个时间戳类型和数据类型格式

我正在读取的CSV文件包含3列。以下是列的格式。

DateTime1 的格式为 "mm/dd/yyyy hh：mm：ss"
DateTime2 的格式是 "dd/mm/yy hh：mm：ss"
日期格式为"月/日/年"

下面的代码允许所有列使用一种时间格式。

schema_datatype = StructType([StructField('DateTime1',TimestampType(),True),
StructField('DateTime2',TimestampType(),True),
StructField('Date',DataType(),True)])

df= spark.read.csv(header=True,
path="sample.csv",
schema=schema_datatype, 
timestampFormat="mm/dd/yyyy hh:mm:ss")

但是如何在使用read.csv时指定每列的日期格式。PS：我正在使用Spark 2.1.0

谢谢

我也有类似的要求.我使用以下代码使用推断模式选项读取 csv。

 Dataset<Row> data = sparkSession.read().format(fileType).option("header",header).option("inferSchema", "true").option("delimiter",delimeter).option("mode", "DROPMALFORMED").load(filePath);
Then i formatted the date using the below statement.
data=data.withColumn("the_date", to_date(unix_timestamp(col("the_date"), "mm/dd/yyyy").cast("timestamp")));

相关内容

最新更新

热门标签：