按Dayofweek DataFrame Spark SQL进行分组和计数值

我已经加载了一个数据框。看起来这样：

uber_converted.show()
+--------------------+--------------------+-------------------+----------+---------+--------------------+
|dispatching_base_num|         pickup_date|affiliated_base_num|locationID|     zone|             borough|
+--------------------+--------------------+-------------------+----------+---------+--------------------+
|              B02765|2015-05-08 19:05:...|             B02764|       262|Manhattan|      Yorkville East|
|              B02765|2015-05-08 19:06:...|             B00013|       234|Manhattan|            Union Sq|
|              B02765|2015-05-08 19:06:...|             B02765|       107|Manhattan|            Gramercy|
|              B02765|2015-05-08 19:06:...|             B02765|       137|Manhattan|            Kips Bay|
|              B02765|2015-05-08 19:06:...|             B02765|       220|    Bronx|Spuyten Duyvil/Ki...|
|              B02765|2015-05-08 19:06:...|             B02765|       138|   Queens|   LaGuardia Airport|
|              B02765|2015-05-08 19:06:...|             B02749|       143|Manhattan| Lincoln Square West|
|              B02765|2015-05-08 19:06:...|             B02765|       244|Manhattan|Washington Height...|
|              B02765|2015-05-08 19:06:...|             B02617|       262|Manhattan|      Yorkville East|
|              B02765|2015-05-08 19:06:...|             B02765|       144|Manhattan| Little Italy/NoLiTa|
|              B02765|2015-05-08 19:06:...|             B00381|       209|Manhattan|             Seaport|
|              B02765|2015-05-08 19:06:...|             B02765|       234|Manhattan|            Union Sq|
|              B02765|2015-05-08 19:06:...|             B02765|       163|Manhattan|       Midtown North|
|              B02765|2015-05-08 19:06:...|             B02765|       181| Brooklyn|          Park Slope|
|              B02765|2015-05-08 19:06:...|             B02765|       116|Manhattan|    Hamilton Heights|
|              B02765|2015-05-08 19:06:...|             B02765|       236|Manhattan|Upper East Side N...|
|              B02765|2015-05-08 19:06:...|             B02765|       140|Manhattan|     Lenox Hill East|
|              B02765|2015-05-08 19:07:...|             B02765|       162|Manhattan|        Midtown East|
|              B02765|2015-05-08 19:07:...|             B02788|       263|Manhattan|      Yorkville West|
|              B02765|2015-05-08 19:07:...|             B02765|       181| Brooklyn|          Park Slope|
+--------------------+--------------------+-------------------+----------+---------+--------------------+

我需要与pickup_date字段在一周的一天中进行分组和计数。结果必须像这样

dayofweek   count
1         -> 234 (Monday)
2         -> 343 (Tuesday)

等...

任何帮助，非常感谢！

您可以使用date_format：

from pyspark.sql.functions import date_format
df.groupBy(date_format(df["pickup_date"], "u").alias("dayofweek")).count()

相关内容

最新更新

热门标签：