| date|
+----------+
| 2/3/1994|
| 3/4/1994|
| 4/5/1994|
| 5/3/1994|
| 6/9/1994|
| 7/8/1994|
| 8/9/1994|
| 9/10/1994|
|10/10/1994|
| 11/4/1994|
| 12/3/1994|
| 2/4/1996|
| 4/9/1996|
| 5/7/96|
| 6/8/1996|
| 7/10/1996|
| 9/11/1996|
| 10/3/1996|
| 6/2/2000|
| 7/2/2000|
from pyspark.sql.functions import to_date
newdate=df6.withColumn(to_date(df6.date, 'yyyy-MM-dd').alias('dt')).show()
TypeError: to_date() takes 1 positional argument but 2 were given
withColumn
语法似乎是错误的。你能试试这个吗:
newdate=df6.withColumn("new_date", to_date("date", 'dd/MM/yyyy')).show()
>>> from pyspark.sql.functions import *
>>> df.show()
+----------+
| date|
+----------+
| 2/3/1994|
| 3/4/1994|
| 4/5/1994|
| 5/3/1994|
| 6/9/1994|
| 7/8/1994|
| 8/9/1994|
| 9/10/1994|
|10/10/1994|
| 11/4/1994|
| 12/3/1994|
| 2/4/1996|
| 4/9/1996|
| 5/7/96|
| 6/8/1996|
| 7/10/1996|
| 9/11/1996|
| 10/3/1996|
| 6/2/2000|
| 7/2/2000|
+----------+
>>> df.withColumn("dt", to_date(col("date"), "MM/dd/yyyy")).show()
+----------+----------+
| date| dt|
+----------+----------+
| 2/3/1994|1994-02-03|
| 3/4/1994|1994-03-04|
| 4/5/1994|1994-04-05|
| 5/3/1994|1994-05-03|
| 6/9/1994|1994-06-09|
| 7/8/1994|1994-07-08|
| 8/9/1994|1994-08-09|
| 9/10/1994|1994-09-10|
|10/10/1994|1994-10-10|
| 11/4/1994|1994-11-04|
| 12/3/1994|1994-12-03|
| 2/4/1996|1996-02-04|
| 4/9/1996|1996-04-09|
| 5/7/96|0096-05-07|
| 6/8/1996|1996-06-08|
| 7/10/1996|1996-07-10|
| 9/11/1996|1996-09-11|
| 10/3/1996|1996-10-03|
| 6/2/2000|2000-06-02|
| 7/2/2000|2000-07-02|
+----------+----------+
to_date
从Spark 2.2.0开始进行了改造,如果您使用的是Spark <2.2.0,那么它只需要一个参数。
请参考 Spark 2.2.0 pyspark.sql.functions.to_date 和 Spark 2.1.0 pyspark.sql.functions.to_date