从表中为每列检索3个样例值

  • 本文关键字:检索 3个 样例 sql netezza
  • 更新时间 :
  • 英文 :


我试图使查询,将检索只是随机的(或只是唯一的)样本值从每列作为概述用户什么格式是表中的值。

表:

项目客户地理tbody> <<tr>
p1c2欧洲
p2c4我们
p3c6日本
p2c9我们
p4c1亚洲

您可以尝试:

select distinct * from table order by customers desc, geography asc limit 3;

或者你可以试试这个(这个有点长):

select distinct * from table where project='p1' limit 1
union distinct 
select distinct * from table where project='p2' and customer not in (select distinct customer from table where project='p1') and geography not in (select distinct geography from table where project='p1') limit 1
union distinct 
select distinct * from table where project='p3' and customer not in (select distinct customer from table where project not in ('p1','p2')) and geography not in (select distinct geography from table where project not in ('p1','p2')) limit 1;

如果我理解正确的话,您希望看到来自每列的三个唯一值,并且它们应该相互独立。试试这个

with project_t as (
select row_number() over (order by project) rn, project
from (select distinct project from tab limit 3) t
),
customer_t as (
select row_number() over (order by customer) rn, customer
from (select distinct customer from tab limit 3) t
),
geography_t as (
select row_number() over (order by geography) rn, geography
from (select distinct geography from tab limit 3) t
)
select p.project, c.customer, g.geography
from project_t p
join customer_t c on p.rn = c.rn
join geography_t g on c.rn = g.rn

我已经在mysql上测试了,但是,窗口函数和CTE应该也可以在Netezza中使用。

您要做的不是非常SQL,因为它将每列中的值彼此分离。

如果任何列实际上没有三个值,也会出现问题。我建议使用union all和aggregation:

select max(project), max(customer), max(geography)
from ((select project, null as customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by project
) union all
(select null, project, customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by customer
) union all
(select null as project, null as customer, geography,
row_number() over (order by random) as seqnum
from t
group by geography
) 
) pcg
where seqnum <= 3
group by seqnum;

当至少一个列中至少有三个不同的值时,这将返回3行。

如果您想要最常见的值,只需将order by子句中的random()替换为count(*) desc

在表上运行' generate statistics '(出于性能原因,应该每周执行一次)之后,可以找到两个值:HIVAL和LOWAL可以在目录中找到它们以及列中不同值的数量和null的数量

所以:如果树是一个硬数字,我无法帮助,但如果在我的商店中,我们在必要时对目录运行"快速数据分析"查询,并从那里继续使用更具体的SQL,并简单地关注更新统计数据....

我将分享SQL查询,如果有人感兴趣????

最新更新