在Pyspark数据框架中将字符串转换为日期



我试图将数据框架中的字符串列转换为日期类型。字符串看起来像这样:

星期五Oct 12 18:14:29 +0000 2018

And I have try this code

df_en.withColumn('date_timestamp',unix_timestamp('created_at','ddd MMM dd HH:mm:ss K yyyy')).show()

但是我得到的结果是:

+--------------------+--------------------+--------------------+--------------+
|          created_at|                text|           sentiment|date_timestamp|
+--------------------+--------------------+--------------------+--------------+
|Mon Oct 15 20:53:...|What a shock hey,...|-0.07755102040816327|          null|
|Fri Oct 12 18:14:...|No Bucky, people ...|                 0.0|          null|
|Wed Oct 10 07:51:...|If Sarah Hanson Y...|                0.05|          null|
|Mon Oct 15 02:30:...|            365 days|                 0.0|          null|
|Sun Oct 14 06:17:...|#HimToo: how an a...|                -0.5|          null|
|Tue Oct 09 07:30:...|hopefully the #Hi...|                 0.0|          null|
|Tue Oct 09 23:30:...|If Labor win Gove...|                 0.8|          null|
|Thu Oct 11 01:09:...|Hello #Perth - th...|                0.75|          null|
|Sat Oct 13 21:47:...|#MeToo changed th...|                 0.0|          null|
|Tue Oct 09 00:41:...|Rich for Queensla...|               0.375|          null|
|Mon Oct 15 12:59:...|Wonder what else ...|                 0.0|          null|
|Mon Oct 15 05:12:...|@dani_ries #metoo...|                 0.0|          null|
|Wed Oct 10 00:30:...|Hey @JackieTrad a...|                0.25|          null|
|Tue Oct 16 04:00:...|“There's this ide...| 0.03611111111111113|          null|
|Sun Oct 14 08:14:...|Is this the attit...|-0.01499999999999999|          null|
|Sat Oct 13 11:26:...|#metoo official s...|                 0.1|          null|
|Tue Oct 09 00:23:...|On the limited an...|-0.01904761904761...|          null|
|Tue Oct 16 14:41:...|Domestic Violence...|                 0.0|          null|
|Wed Oct 10 23:34:...|@australian Note ...|                 0.0|          null|
|Sat Oct 06 20:07:...|Wtaf, America. I ...|                 0.0|          null|
+--------------------+--------------------+--------------------+--------------+

我也试过

df_en.select(col("created_at"),to_date(col("created_at")).alias("to_date") ).show()

结果完全相同。我不知道为什么,有人能帮帮我吗?

用Spark配置.config('spark.sql.legacy.timeParserPolicy', 'LEGACY')试试这个模式EEE MMM dd HH:mm:ss Z yyyy。也检查一下这个

相关内容

  • 没有找到相关文章

最新更新