我在bigquery中有一个这样的表(260000行(:
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
我想要的正是按月份对这些数据进行分组,然后只找到当月排名前20的供应商的销售额总和。预期输出:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
我试过这个(这里有消息来源(。但无法获得所需的输出。
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t
考虑以下方法
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20