我有一个pyspark数据帧,看起来像下面的
df
year month day
2017 9 3
2015 5 16
我想创建一个列作为datetime
,就像下面的一样
df
year month day date
2017 9 3 2017-09-03 00:00:00
2015 5 16 2017-05-16 00:00:00
您可以使用concat_ws
进行连接,并使用to_date
转换为date
from pyspark.sql.functions import *
df = spark.createDataFrame([[2017,9,3 ],[2015,5,16]],['year', 'month','date'])
df = df.withColumn('timestamp',to_date(concat_ws('-', df.year, df.month,df.date)))
df.show()
+----+-----+----+----------+
|year|month|date| timestamp|
+----+-----+----+----------+
|2017| 9| 3|2017-09-03|
|2015| 5| 16|2015-05-16|
+----+-----+----+----------+
架构:
df.printSchema()
root
|-- year: long (nullable = true)
|-- month: long (nullable = true)
|-- date: long (nullable = true)
|-- timestamp: date (nullable = true)