如何使用HiveQL按月份字段执行聚合(总和)



以下是我的数据,我希望使用event_time和price列每月生成收入总和。

+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
|   oct_data.event_time    | oct_data.event_type  | oct_data.product_id  | oct_data.category_id  | oct_data.category_code  | oct_data.brand  | oct_data.price  | oct_data.user_id  |         oct_data.user_session         |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| 2019-10-01 00:00:00 UTC  | cart                 | 5773203              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:03 UTC  | cart                 | 5773353              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:07 UTC  | cart                 | 5881589              | 2151191071051219817   |                         | lovely          | 13.48           | 429681830         | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9  |
| 2019-10-01 00:00:07 UTC  | cart                 | 5723490              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:15 UTC  | cart                 | 5881449              | 1487580013522845895   |                         | lovely          | 0.56            | 429681830         | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9  |
| 2019-10-01 00:00:16 UTC  | cart                 | 5857269              | 1487580005134238553   |                         | runail          | 2.62            | 430174032         | 73dea1e7-664e-43f4-8b30-d32b9d5af04f  |
| 2019-10-01 00:00:19 UTC  | cart                 | 5739055              | 1487580008246412266   |                         | kapous          | 4.75            | 377667011         | 81326ac6-daa4-4f0a-b488-fd0956a78733  |
| 2019-10-01 00:00:24 UTC  | cart                 | 5825598              | 1487580009445982239   |                         |                 | 0.56            | 467916806         | 2f5b5546-b8cb-9ee7-7ecd-84276f8ef486  |
| 2019-10-01 00:00:25 UTC  | cart                 | 5698989              | 1487580006317032337   |                         |                 | 1.27            | 385985999         | d30965e8-1101-44ab-b45d-cc1bb9fae694  |
| 2019-10-01 00:00:26 UTC  | view                 | 5875317              | 2029082628195353599   |                         |                 | 1.59            | 474232307         | 445f2b74-5e4c-427e-b7fa-6e0a28b156fe  |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+

我使用了下面的查询,但总和似乎没有出现。请提出产生所需产出的最佳方法。

select date_format(event_time,'MM') as Month, 
sum(price) as Monthly_Revenue 
from oct_data_new 
group by date_format(event_time,'MM') 
order by Month;

注意:event_time字段的格式为TIMESTAMP。

首先将时间戳转换为日期,然后应用date_format():

select date_format(cast(event_time as date),'MM') as Month, 
sum(price) as Monthly_Revenue 
from oct_data_new 
group by date_format(cast(event_time as date),'MM') 
order by Month;

如果所有日期都在同一年,这将起作用
如果没有,则还应按年份分组。

您的代码应该可以工作——除非您使用的是旧版本的Hive。date_format()自2016年初发布的1.1.2以来就接受了timestamp的论点。也就是说,我强烈建议您将年份包括在内:

select date_format(event_time, 'yyyy-MM') as Month, 
sum(price) as Monthly_Revenue 
from oct_data_new 
group by date_format(event_time, 'yyyy-MM') 
order by Month;

相关内容

最新更新