我试图使查询,将检索只是随机的(或只是唯一的)样本值从每列作为概述用户什么格式是表中的值。
表:
p1 | c2 | 欧洲 |
p2 | c4 | 我们 |
p3 | c6 | 日本 |
p2 | c9 | 我们 |
p4 | c1 | 亚洲 |
… | … | … |
您可以尝试:
select distinct * from table order by customers desc, geography asc limit 3;
或者你可以试试这个(这个有点长):
select distinct * from table where project='p1' limit 1
union distinct
select distinct * from table where project='p2' and customer not in (select distinct customer from table where project='p1') and geography not in (select distinct geography from table where project='p1') limit 1
union distinct
select distinct * from table where project='p3' and customer not in (select distinct customer from table where project not in ('p1','p2')) and geography not in (select distinct geography from table where project not in ('p1','p2')) limit 1;
如果我理解正确的话,您希望看到来自每列的三个唯一值,并且它们应该相互独立。试试这个
with project_t as (
select row_number() over (order by project) rn, project
from (select distinct project from tab limit 3) t
),
customer_t as (
select row_number() over (order by customer) rn, customer
from (select distinct customer from tab limit 3) t
),
geography_t as (
select row_number() over (order by geography) rn, geography
from (select distinct geography from tab limit 3) t
)
select p.project, c.customer, g.geography
from project_t p
join customer_t c on p.rn = c.rn
join geography_t g on c.rn = g.rn
我已经在mysql上测试了,但是,窗口函数和CTE应该也可以在Netezza中使用。
您要做的不是非常SQL,因为它将每列中的值彼此分离。
如果任何列实际上没有三个值,也会出现问题。我建议使用union all
和aggregation:
select max(project), max(customer), max(geography)
from ((select project, null as customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by project
) union all
(select null, project, customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by customer
) union all
(select null as project, null as customer, geography,
row_number() over (order by random) as seqnum
from t
group by geography
)
) pcg
where seqnum <= 3
group by seqnum;
当至少一个列中至少有三个不同的值时,这将返回3行。
如果您想要最常见的值,只需将order by
子句中的random()
替换为count(*) desc
。
在表上运行' generate statistics '(出于性能原因,应该每周执行一次)之后,可以找到两个值:HIVAL和LOWAL可以在目录中找到它们以及列中不同值的数量和null的数量
所以:如果树是一个硬数字,我无法帮助,但如果在我的商店中,我们在必要时对目录运行"快速数据分析"查询,并从那里继续使用更具体的SQL,并简单地关注更新统计数据....
我将分享SQL查询,如果有人感兴趣????