火花数据集相关



我的数据集看起来像这样

+------+---------------+----+
|  City|      Timestamp|Sale|
+------+---------------+----+
|City 3|6/30/2017 16:04|  28|
|City 4| 7/4/2017 16:04|  12|
|City 2|7/13/2017 16:04|   8|
|City 4|7/16/2017 16:04|  21|
|City 4| 7/3/2017 16:04|  24|
|City 2|7/17/2017 16:04|  34|
|City 3| 7/9/2017 16:04|  13|
|City 3|7/18/2017 16:04|  26|
|City 3| 7/6/2017 16:04|  16|
|City 3|7/15/2017 16:04|  29|
|City 4|7/18/2017 16:04|  39|
|City 2| 7/1/2017 16:04|  19|
|City 2|7/18/2017 16:04|  19|
|City 4| 7/4/2017 16:04|  24|
|City 2| 7/4/2017 16:04|   9|
|City 4|7/15/2017 16:04|  20|
|City 3|7/12/2017 16:04|  19|
|City 1| 7/9/2017 16:04|  13|
|City 1|7/13/2017 16:04|  25|
|City 4|7/10/2017 16:04|  10|
+------+---------------+----+

我们需要以周为单位计算每个CitySale之和。

您可以按City分组,Time stamp并对Sales求和

data.groupBy("City", "TimeStamp").agg(sum(col("Sale")).as("TotalSale")).show

希望这有帮助!

最新更新