从表中为每列检索3个样例值

我试图使查询，将检索只是随机的(或只是唯一的)样本值从每列作为概述用户什么格式是表中的值。

表:

项目客户地理tbody> <<tr>

p1	c2	欧洲
p2	c4	我们
p3	c6	日本
p2	c9	我们
p4	c1	亚洲
…	…	…

您可以尝试:

select distinct * from table order by customers desc, geography asc limit 3;

或者你可以试试这个(这个有点长):

select distinct * from table where project='p1' limit 1
union distinct 
select distinct * from table where project='p2' and customer not in (select distinct customer from table where project='p1') and geography not in (select distinct geography from table where project='p1') limit 1
union distinct 
select distinct * from table where project='p3' and customer not in (select distinct customer from table where project not in ('p1','p2')) and geography not in (select distinct geography from table where project not in ('p1','p2')) limit 1;

如果我理解正确的话，您希望看到来自每列的三个唯一值，并且它们应该相互独立。试试这个

with project_t as (
select row_number() over (order by project) rn, project
from (select distinct project from tab limit 3) t
),
customer_t as (
select row_number() over (order by customer) rn, customer
from (select distinct customer from tab limit 3) t
),
geography_t as (
select row_number() over (order by geography) rn, geography
from (select distinct geography from tab limit 3) t
)
select p.project, c.customer, g.geography
from project_t p
join customer_t c on p.rn = c.rn
join geography_t g on c.rn = g.rn

我已经在mysql上测试了，但是，窗口函数和CTE应该也可以在Netezza中使用。

您要做的不是非常SQL，因为它将每列中的值彼此分离。

如果任何列实际上没有三个值，也会出现问题。我建议使用union all和aggregation:

select max(project), max(customer), max(geography)
from ((select project, null as customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by project
) union all
(select null, project, customer, null as geography,
row_number() over (order by random) as seqnum
from t
group by customer
) union all
(select null as project, null as customer, geography,
row_number() over (order by random) as seqnum
from t
group by geography
) 
) pcg
where seqnum <= 3
group by seqnum;

当至少一个列中至少有三个不同的值时，这将返回3行。

如果您想要最常见的值，只需将order by子句中的random()替换为count(*) desc。

在表上运行' generate statistics '(出于性能原因，应该每周执行一次)之后，可以找到两个值:HIVAL和LOWAL可以在目录中找到它们以及列中不同值的数量和null的数量

所以:如果树是一个硬数字，我无法帮助，但如果在我的商店中，我们在必要时对目录运行"快速数据分析"查询，并从那里继续使用更具体的SQL，并简单地关注更新统计数据....

我将分享SQL查询，如果有人感兴趣????

相关内容

最新更新

热门标签：