我试图将数据框架中的字符串列转换为日期类型。字符串看起来像这样:
星期五Oct 12 18:14:29 +0000 2018
And I have try this code
df_en.withColumn('date_timestamp',unix_timestamp('created_at','ddd MMM dd HH:mm:ss K yyyy')).show()
但是我得到的结果是:
+--------------------+--------------------+--------------------+--------------+
| created_at| text| sentiment|date_timestamp|
+--------------------+--------------------+--------------------+--------------+
|Mon Oct 15 20:53:...|What a shock hey,...|-0.07755102040816327| null|
|Fri Oct 12 18:14:...|No Bucky, people ...| 0.0| null|
|Wed Oct 10 07:51:...|If Sarah Hanson Y...| 0.05| null|
|Mon Oct 15 02:30:...| 365 days| 0.0| null|
|Sun Oct 14 06:17:...|#HimToo: how an a...| -0.5| null|
|Tue Oct 09 07:30:...|hopefully the #Hi...| 0.0| null|
|Tue Oct 09 23:30:...|If Labor win Gove...| 0.8| null|
|Thu Oct 11 01:09:...|Hello #Perth - th...| 0.75| null|
|Sat Oct 13 21:47:...|#MeToo changed th...| 0.0| null|
|Tue Oct 09 00:41:...|Rich for Queensla...| 0.375| null|
|Mon Oct 15 12:59:...|Wonder what else ...| 0.0| null|
|Mon Oct 15 05:12:...|@dani_ries #metoo...| 0.0| null|
|Wed Oct 10 00:30:...|Hey @JackieTrad a...| 0.25| null|
|Tue Oct 16 04:00:...|“There's this ide...| 0.03611111111111113| null|
|Sun Oct 14 08:14:...|Is this the attit...|-0.01499999999999999| null|
|Sat Oct 13 11:26:...|#metoo official s...| 0.1| null|
|Tue Oct 09 00:23:...|On the limited an...|-0.01904761904761...| null|
|Tue Oct 16 14:41:...|Domestic Violence...| 0.0| null|
|Wed Oct 10 23:34:...|@australian Note ...| 0.0| null|
|Sat Oct 06 20:07:...|Wtaf, America. I ...| 0.0| null|
+--------------------+--------------------+--------------------+--------------+
我也试过
df_en.select(col("created_at"),to_date(col("created_at")).alias("to_date") ).show()
结果完全相同。我不知道为什么,有人能帮帮我吗?
用Spark配置.config('spark.sql.legacy.timeParserPolicy', 'LEGACY')
试试这个模式EEE MMM dd HH:mm:ss Z yyyy
。也检查一下这个