我已经加载了一个数据框。看起来这样:
uber_converted.show()
+--------------------+--------------------+-------------------+----------+---------+--------------------+
|dispatching_base_num| pickup_date|affiliated_base_num|locationID| zone| borough|
+--------------------+--------------------+-------------------+----------+---------+--------------------+
| B02765|2015-05-08 19:05:...| B02764| 262|Manhattan| Yorkville East|
| B02765|2015-05-08 19:06:...| B00013| 234|Manhattan| Union Sq|
| B02765|2015-05-08 19:06:...| B02765| 107|Manhattan| Gramercy|
| B02765|2015-05-08 19:06:...| B02765| 137|Manhattan| Kips Bay|
| B02765|2015-05-08 19:06:...| B02765| 220| Bronx|Spuyten Duyvil/Ki...|
| B02765|2015-05-08 19:06:...| B02765| 138| Queens| LaGuardia Airport|
| B02765|2015-05-08 19:06:...| B02749| 143|Manhattan| Lincoln Square West|
| B02765|2015-05-08 19:06:...| B02765| 244|Manhattan|Washington Height...|
| B02765|2015-05-08 19:06:...| B02617| 262|Manhattan| Yorkville East|
| B02765|2015-05-08 19:06:...| B02765| 144|Manhattan| Little Italy/NoLiTa|
| B02765|2015-05-08 19:06:...| B00381| 209|Manhattan| Seaport|
| B02765|2015-05-08 19:06:...| B02765| 234|Manhattan| Union Sq|
| B02765|2015-05-08 19:06:...| B02765| 163|Manhattan| Midtown North|
| B02765|2015-05-08 19:06:...| B02765| 181| Brooklyn| Park Slope|
| B02765|2015-05-08 19:06:...| B02765| 116|Manhattan| Hamilton Heights|
| B02765|2015-05-08 19:06:...| B02765| 236|Manhattan|Upper East Side N...|
| B02765|2015-05-08 19:06:...| B02765| 140|Manhattan| Lenox Hill East|
| B02765|2015-05-08 19:07:...| B02765| 162|Manhattan| Midtown East|
| B02765|2015-05-08 19:07:...| B02788| 263|Manhattan| Yorkville West|
| B02765|2015-05-08 19:07:...| B02765| 181| Brooklyn| Park Slope|
+--------------------+--------------------+-------------------+----------+---------+--------------------+
我需要与pickup_date字段在一周的一天中进行分组和计数。结果必须像这样
dayofweek count
1 -> 234 (Monday)
2 -> 343 (Tuesday)
等...
任何帮助,非常感谢!
您可以使用date_format
:
from pyspark.sql.functions import date_format
df.groupBy(date_format(df["pickup_date"], "u").alias("dayofweek")).count()