在使用max()和group by之后获取多行



我正试图根据xml_id具有最活跃属性的段将一个段分配给xml_id

我正在创建一个带有选择的表格,如下所示:

create table schema.table4 as
select
yyyy_mm_dd,
xml_id,
segment as xml_segment,
max(property_count)
from(
select
t1.yyyy_mm_dd,
t2.xml_id,
t3.segment,
count(t1.hotel_id) as property_count
from(
select
yyyy_mm_dd,
hotel_id
from
schema.table1
where
is_active = 1
and yyyy_mm_dd = "2020-10-01"
) t1
left join(
select
yyyy_mm_dd,
hotel_id,
xml_id
from
schema.table2
where
yyyy_mm_dd = "2020-10-01"
and xml_id is not null
) t2 on t2.hotel_id = t1.hotel_id and t2.yyyy_mm_dd = t1.yyyy_mm_dd
inner join
schema.table3 t3 on t3.hotel_id = t1.hotel_id
group by
1,2,3
) x
group by
1,2,3

但是,当我查询这个表时,我可以看到每个xml_id有多个段。我本以为只会插入具有MAX()属性_计数的段。为什么不是这样?

select
xml_id, count(*)
from
schema.table4

显示多行,其中xml_id有两行和三行。我需要它只有一行,并且该段应该是具有最高property_count的一行。

下面是一些示例输出。xml_id继承了它具有最多属性的段。


t1:

| yyyy_mm_dd | hotel_id | is_active |
|------------|----------|-----------|
| 2020-10-01 | 1        | 1         |
| 2020-10-01 | 2        | 1         |
| 2020-10-01 | 3        | 1         |
| 2020-10-01 | 4        | 1         |
| 2020-10-01 | 5        | 1         |
| 2020-10-01 | 6        | 1         |
| 2020-10-01 | 7        | 0         |

t2:

| yyyy_mm_dd | hotel_id | xml_id |
|------------|----------|--------|
| 2020-10-01 | 1        | 444    |
| 2020-10-01 | 2        | 444    |
| 2020-10-01 | 3        | 444    |
| 2020-10-01 | 4        | 920    |
| 2020-10-01 | 5        | 920    |
| 2020-10-01 | 6        | 920    |
| 2020-10-01 | 7        | null   |

t3:

| hotel_id | segment |
|----------|---------|
| 1        | Home    |
| 2        | Core    |
| 3        | Core    |
| 4        | Core    |
| 5        | Home    |
| 6        | Home    |
| 7        | Chain   |

预期输出:

| yyyy_mm_dd | xml_id | segment |
|------------|--------|---------|
| 2020-10-01 | 444    | Core    |
| 2020-10-01 | 920    | Home    |

如果我理解正确,您需要按日期xml_idsegment计算活动的数量。这是该计数的基本查询:

select t2.yyyy_mm_dd, t2.xml_id, t3.segment, count(*) as num_actives
from t2 join
t1
on t2.hotel_id = t1.hotel_id and
t2.yyyy_mm_dd  = t1.yyyy_mm_dd join
t3
on t3.hotel_id = t2.hotel_id
where t1.is_active = 1
group by t2.yyyy_mm_dd, t2.xml_id, t3.segment;

要获得最高值,可以使用窗口函数:

select t.*
from (select t2.yyyy_mm_dd, t2.xml_id, t3.segment, count(*) as num_actives,
row_number() over (partition by t2.yyyy_mm_dd, t2.xml_id order by count(*) desc) as seqnum
from t2 join
t1
on t2.hotel_id = t1.hotel_id and
t2.yyyy_mm_dd  = t1.yyyy_mm_dd join
t3
on t3.hotel_id = t2.hotel_id
where t1.is_active = 1
group by t2.yyyy_mm_dd, t2.xml_id, t3.segment
) t
where seqnum = 1;

最新更新