我有一个pyspark数据帧,我要把它的一个列(时间戳)转换成Jalali日期。
我的数据帧:
Name | CreationDate | 莎拉 | 2022-01-02 10:49:43 |
---|---|
米娜 | 2021-01-02 12:30:21 |
您需要这样定义UDF:
import jdatetime
from pyspark.sql import functions as F
@F.udf(StringType())
def to_jalali(ts):
jts = jdatetime.datetime.fromgregorian(datetime=ts)
return jts.strftime("%a, %d %b %Y %H:%M:%S")
然后应用到你的例子:
df = spark.createDataFrame([("Sara", "2022-01-02 10:49:43"), ("Mina", "2021-01-02 12:30:21")], ["Name", "CreationDate"])
# cast column CreationDate into timestamp type of not already done
# df = df.withColumn("CreationDate", F.to_timestamp("CreationDate"))
df = df.withColumn("CreationDate", to_jalali("CreationDate"))
df.show(truncate=False)
#+----+-------------------------+
#|Name|CreationDate |
#+----+-------------------------+
#|Sara|Sun, 12 Dey 1400 10:49:43|
#|Mina|Sat, 13 Dey 1399 12:30:21|
#+----+-------------------------+