Hive Multi Select


CID    F_ID       NME
1       A          QR
1       B          QB
2       A          QR
3       B          QB
4       A          QR
4       B          QB

结果:-

CID F_ID    NME
1    A      QR
1    B      QB
4    A      QR
4    B      QB

在Hive中,要获得结果的查询应该只产生CID在F_ ID-A&B、 我可以在oracle 中使用LISTAGG实现同样的效果

此查询将在单个映射缩减阶段执行:

select CID,    F_ID,       NME  from
(
select s.*, 
sum(A) over (partition by CID) A_cnt,
sum(B) over (partition by CID) B_cnt
from
(
select s.*,
case when F_ID='A' then 1 else 0 end A,
case when F_ID='B' then 1 else 0 end B
from your_table
)s
)s where A_cnt>=1 and B_cnt >=1
;

演示:

select CID,    F_ID,       NME  from
(
select s.*, 
sum(A) over (partition by CID) A_cnt,
sum(B) over (partition by CID) B_cnt
from
(
select s.*,
case when F_ID='A' then 1 else 0 end A,
case when F_ID='B' then 1 else 0 end B
from
( --replace this subquery (s) with your table  
select stack(6,
1,       'A',          'QR',
1,       'B',          'QB',
2,       'A',          'QR',
3,       'B',          'QB',
4,       'A',          'QR',
4,       'B',          'QB') as (CID,    F_ID,       NME)
) s
)s
)s where A_cnt>=1 and B_cnt >=1
;

结果:

MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.39 sec   HDFS Read: 13549 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 390 msec
OK
1       B       QB
1       A       QR
4       B       QB
4       A       QR
Time taken: 108.779 seconds, Fetched: 4 row(s)

最新更新