TPC-DS 查询 6：为什么我们需要'where j.i_category = i.i_category'条件？

我正在浏览亚马逊雅典娜的TPC-DS。

在查询5之前一切正常。

我在查询6中遇到了一些问题。(在下面(

select  a.ca_state state, count(*) cnt
from customer_address a
,customer c
,store_sales s
,date_dim d
,item i
where       a.ca_address_sk = c.c_current_addr_sk
and c.c_customer_sk = s.ss_customer_sk
and s.ss_sold_date_sk = d.d_date_sk
and s.ss_item_sk = i.i_item_sk
and d.d_month_seq = 
(select distinct (d_month_seq)
from date_dim
where d_year = 2002
and d_moy = 3 )
and i.i_current_price > 1.2 * 
(select avg(j.i_current_price) 
from item j 
where j.i_category = i.i_category)
group by a.ca_state
having count(*) >= 10
order by cnt, a.ca_state 
limit 100;

它花了30多分钟，所以超时失败了。

我试图找出哪个部分导致了问题，所以我检查了where条件，并找到了where条件的最后一部分的where j.i_category = i.i_category。

我不知道为什么需要这个条件，所以我删除了这个部分，查询运行正常。

你们能告诉我为什么需要这个零件吗？

j.i_category = i.i_category是子查询关联条件。如果将其从子查询中删除

select avg(j.i_current_price) 
from item j 
where j.i_category = i.i_category)

子查询变得不相关，并成为item表上的全局聚合，这很容易计算，查询引擎需要做一次。

如果你想在AWS上获得一个快速、高性能的查询引擎，我可以推荐Starburst Presto(免责声明：我来自Starburst(。看见https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/用于相关比较(注意：这是而不是与雅典娜的比较(。

如果不一定要那么快，那么您可以在EMR上使用PrestoSQL(请注意，EMR上的"PrestoSQL"one_answers"Presto"组件不是一回事(。

相关内容

最新更新

热门标签：