数据框中的列如下所示:
Input--> 20191106 ---Output---> 2019-11-06 00:00:00
Input--> 20180815--Output---> 2018-08-15 00:00:00
法典:
from pyspark.sql.functions import from_unixtime, unix_timestamp
df.withColumn("newcol", from_unixtime(unix_timestamp(df("coldt"), "YYYY--MM-DD HH:MM:SS")))
错误:
File "C:/Users/nance.py", line 14, in <module>
df.withColumn("newcol", from_unixtime(unix_timestamp(df("coldt"), "YYYY--MM-DD HH:MM:SS")))
TypeError: 'DataFrame' object is not callable
请帮忙。
只使用to_timestamp函数。
工作示例:
import pyspark.sql.functions as F
df = spark.createDataFrame([['20191106'],['20180815']])
df = df.withColumn('dates',F.to_timestamp('_1','yyyyMMdd'))
display(df)