当优化器参数更改(ON/OFF)时,HAWQ为同一sql返回不同的输出



在处理GreenPlum HAWQ 1.1.3时遇到一个奇怪的行为。想要执行一个简单的exists查询,但得到了错误的结果。

gpadmin=# select version();
      version                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.1.3.0 build 4609) on x86_64-unknown-linux-gnu, compiled by GCCgcc (GCC) 4.4.2 compiled on Oct 27 2013 20:
53:09
(1 row)
gpadmin=# Set optimizer=off;
SET
gpadmin=# with temp as (
          Select 1 one, 2 two 
          union all 
          Select 3,4
          ),
          temp1 as (
          Select 3 three
          )
          Select * from temp 
          where exists (Select * from temp1 where three=one);

one | two 
-----+-----
(0 rows)
gpadmin=# 

因此,当HAWQ的优化器参数设置为OFF时,查询返回0行,理想情况下应返回temp1和temp数据集中存在的记录,即3,4

所以我尝试了同样的查询,但通过将参数设置为ON

gpadmin=# Set optimizer=on;
SET
gpadmin=# 
          with temp as (
          Select 1 one, 2 two 
          union all 
          Select 3,4
          ),
          temp1 as (
          Select 3 three
          ) 
          Select * from temp
          where exists (Select * from temp1 where three=one);

one | two 
-----+-----
   3 |   4
(1 row)

在这里,查询返回期望的结果。

我想Greenplum的工程师应该想出一个解决方案来解决优化器的这种错误行为。

此外,当在Greenplum DCA 4.2上执行上述查询时,数据库崩溃并重新启动。我不明白一个人怎么能向世界发布这样不成熟/未经测试的产品。这些只是一些简单的基本查询,在投放市场之前至少应该经过测试(阿尔法测试)。最重要的是,当他们的工程师团队提出这个问题时,他们这边的支持人员要求为上述查询中的表提供DDL。我想支持人员没有理解这个查询不包含任何表,它只包含使用with子句创建的临时数据集。

  1. "优化器"不是关于更深入的查询优化。它启用了名为ORCA的全新查询优化器,而在禁用状态下使用的Postgres优化器称为planner
  2. 此问题应提交给Pivotal Support,由工程师解决
  3. 这是由union all引起的拐角情况,并在master上进行选择,它与表配合良好

这是代码:

create table test (one int, two int);
insert into test values (1, 2), (3, 4);
create table test2 (three int);
insert into test2 values (3);
Select * from test 
where exists (Select * from test2 where three=one);

我稍微重新表述了您的查询,将temp重命名为t1,并将temp1重命名为t2,这样可以更清楚地解释问题。

你的问题有两个答案:

1) 我想您使用的是旧版本的hawq,其中查询在优化器打开的情况下工作,而在优化器关闭的情况下不工作。在最新的apachehawq中,它实际上不能与优化器一起工作。原因是当优化器打开时,它现在又回到了计划器

2) 查询失败是由于有关查询执行器的错误。

具体地说,CTE(公共表表达式)子句中的t2被视为子查询扫描,然后被物化是正确的。

然而,当t2被求值时,它不生成元组。因此,t1与t2的连接不生成元组。

我们可以在优化器=关闭的情况下运行查询时使用解释分析在查询执行统计信息中看到这一点

   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.

我们有HAWQ-884跟踪此问题,您可以参考它了解更新和详细信息。

PS:以下是有关查询执行统计信息的详细信息:

1) hawq2.0,优化器关闭(计划器)

show optimizer;
 optimizer
-----------
 off
(1 row)
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)
EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.237/0.237 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0/0 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.129/0.129 ms to end, start offset by 0.135/0.135 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.002/0.002 ms to first row, 0.004/0.004 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.255/0.255 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.261/0.261 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6; optimizer=off
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.372 ms
(26 rows)

2) (orca)上带有优化器的hawq2.0

show optimizer;
 optimizer
-----------
 on
(1 row)
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)
EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.05..0.29 rows=72 width=8)
   Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.273/0.273 ms to end.
   ->  Limit  (cost=0.00..0.00 rows=1 width=0)
         Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.003/0.003 ms to end.
         ->  Subquery Scan t2  (cost=0.00..0.01 rows=6 width=0)
               Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.002/0.002 ms to end.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     One-Time Filter: 3 = $0
                     Rows out:  Avg 0.0 rows x 0 workers.  Max/Last(/) 0/0 rows with 0.001/0.001 ms to end.
   ->  Materialize  (cost=0.05..0.17 rows=12 width=8)
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.151/0.151 ms to end, start offset by 0.189/0.189 ms.
         ->  Append  (cost=0.00..0.04 rows=2 width=0)
               Rows out:  Avg 2.0 rows x 1 workers.  Max/Last(/) 2/2 rows with 0.003/0.003 ms to first row, 0.004/0.004 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0.002/0.002 ms to end, start offset by 0.327/0.327 ms.
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(/) 1/1 rows with 0/0 ms to end, start offset by 0.337/0.337 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  default_hash_table_bucket_number=6
 Optimizer status: legacy query optimizer
 Data locality statistics:
   no data locality information in this query
 Total runtime: 0.468 ms
(26 rows)

3) Hawq1.x,优化器关闭(计划器)

show optimizer;
 optimizer
-----------
 off
(1 row)
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
(0 rows)
EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..0.08 rows=4 width=8)
   Rows out:  0 rows with 0.220 ms to end, start offset by 0.093 ms.
   ->  Append  (cost=0.00..0.04 rows=2 width=0)
         Rows out:  2 rows with 0.001 ms to first row, 0.003 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.262 ms.
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
               Rows out:  1 rows with 0.001 ms to end, start offset by 0.264 ms.
   ->  Materialize  (cost=0.00..0.01 rows=1 width=0)
         Rows out:  0 rows with 0.167 ms to end of 3 scans, start offset by 0.265 ms.
         ->  Limit  (cost=0.00..0.00 rows=1 width=0)
               Rows out:  0 rows with 0.003 ms to end, start offset by 0.257 ms.
               ->  Subquery Scan t2  (cost=0.00..0.01 rows=1 width=0)
                     Rows out:  0 rows with 0.002 ms to end, start offset by 0.258 ms.
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                           One-Time Filter: 3 = $0
                           Rows out:  0 rows with 0.001 ms to end, start offset by 0.258 ms.
 Slice statistics:
   (slice0)    Executor memory: 61K bytes.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
 Total runtime: 0.315 ms
(24 rows)

4) (orca)上带有优化器的hawq1.x

show optimizer;
 optimizer
-----------
 on
(1 row)
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
 c1 | c2
----+----
  3 |  4
(1 row)
EXPLAIN ANALYZE
WITH t1 AS ( SELECT 1 c1, 2 c2 UNION ALL SELECT 3 c1, 4 c2 ),
     t2 AS ( SELECT 3 c3 )
SELECT * FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE c3=c1);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash EXISTS Join  (cost=0.00..0.00 rows=2 width=8)
   Hash Cond: "outer".c1 = "inner".c3
   Rows out:  1 rows with 0.835 ms to first row, 2.373 ms to end, start offset by 0.194 ms.
   Executor memory:  1K bytes.
   Work_mem used:  1K bytes. Workfile: (0 spilling, 0 reused)
   Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
   ->  Append  (cost=0.00..0.00 rows=2 width=8)
         Rows out:  2 rows with 0.002 ms to first row, 0.004 ms to end, start offset by 1.023 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to first row, 0.002 ms to end, start offset by 1.023 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.024 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               Rows out:  1 rows with 0.001 ms to end, start offset by 1.026 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0 ms to end, start offset by 1.026 ms.
   ->  Hash  (cost=0.00..0.00 rows=1 width=4)
         Rows in:  1 rows with 0.010 ms to end, start offset by 1.013 ms.
         ->  Result  (cost=0.00..0.00 rows=1 width=4)
               Rows out:  1 rows with 0.006 ms to first row, 0.007 ms to end, start offset by 1.013 ms.
               ->  Result  (cost=0.00..0.00 rows=1 width=1)
                     Rows out:  1 rows with 0.001 ms to end, start offset by 1.014 ms.
 Slice statistics:
   (slice0)    Executor memory: 8270K bytes.  Work_mem: 1K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Optimizer status: PQO version 1.591
 Total runtime: 2.572 ms
(28 rows)

相关内容

  • 没有找到相关文章

最新更新