我正在使用如下所示的数据集(由于它太小而没有创建小提琴)
我有一个表tblReqs
,其基本结构如下:
| Onum | Pnum | ReqNum |
|:----:|:----:|:------:|
| NULL | P427 | RN1148 |
| NULL | P324 | RN1725 |
| NULL | P229 | RN1242 |
| O396 | NULL | RN1457 |
| O380 | NULL | RN1205 |
| O258 | NULL | RN1482 |
然后我有一个直数表,称为tblnums
,如下所示:
| nums |
|------|
| O258 |
| O370 |
| O490 |
| O314 |
| O379 |
| P341 |
| P230 |
| P280 |
| P324 |
| P395 |
我需要有条件地加入tblnums
tblReqs
.tblReqs
中的Onum
或Pnum
等于tblnums
中的nums
字段。
目标将是一个如下所示的数据集:
| nums | ReqNum |
|:----:|:------:|
| O258 | RN1482 |
| O370 | |
| O490 | |
| O314 | |
| O379 | |
| P341 | |
| P230 | |
| P280 | |
| P324 | RN1725 |
| P395 | |
我知道我可以在连接中使用OR
运算符,如下所示:
SELECT
tblnums.nums,
tblReqs.ReqNum
FROM
tblnums
LEFT JOIN tblReqs ON tblnums.nums = tblReqs.Onum OR tblnums.nums = tblReqs.Pnum
但我最近了解到COALESCE
:
SELECT
tblnums.nums,
tblReqs.ReqNum
FROM
tblnums
LEFT JOIN tblReqs ON tblnums.nums = COALESCE(tblReqs.Onum, tblReqs.Pnum)
是否有偏爱一个而不是另一个?我知道我可以在我的数据集上尝试两者,但测量时间不是一个很好的指标,因为它受到许多其他限制(数据库维护、网络带宽、CPU/RAM 功率等)的影响。此外,我无法通过 SQL 开发人员提取许多指标,因为我的 IT 团队已将其锁定。
条件连接首选这两种方法中的哪一种,OR
或COALESCE
?从运行时复杂性的角度来看,发生了什么,等等?
使用 coalesce(或 NVL)有一个好处,因为您可以利用基于函数的索引来提高查询性能。
设置:
FSITJA@db01 2019-07-18 09:21:33> create table tblreqs (onum, pnum, reqnum primary key) as
2 with t (onum, pnum, reqnum) as (
3 select NULL, 'P427', 'RN1148' from dual union all
4 select NULL, 'P324', 'RN1725' from dual union all
5 select NULL, 'P229', 'RN1242' from dual union all
6 select 'O396', NULL, 'RN1457' from dual union all
7 select 'O380', NULL, 'RN1205' from dual union all
8 select 'O258', NULL, 'RN1482' from dual
9 ) select * from t;
Table created.
FSITJA@db01 2019-07-18 09:21:33> create table tblnums (nums primary key) as
2 with t (nums)as (
3 select 'O258' from dual union all
4 select 'O370' from dual union all
5 select 'O490' from dual union all
6 select 'O314' from dual union all
7 select 'O379' from dual union all
8 select 'P341' from dual union all
9 select 'P230' from dual union all
10 select 'P280' from dual union all
11 select 'P324' from dual union all
12 select 'P395' from dual
13 ) select * from t;
Table created.
请注意,在下面的选项中,由于条件连接(在TBLREQS上,表访问存储已满),oracle无法利用索引:
FSITJA@db01 2019-07-18 09:21:33> explain plan for
2 SELECT n.nums,
3 r.ReqNum
4 FROM tblnums n
5 LEFT JOIN tblReqs r ON n.nums = r.Onum OR n.nums = r.pnum;
Explained.
FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1571794044
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 130 | 31 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 10 | 130 | 31 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | SYS_C0047401 | 10 | 50 | 1 (0)| 00:00:01 |
| 3 | VIEW | VW_LAT_EB747914 | 1 | 8 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS STORAGE FULL| TBLREQS | 1 | 13 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("R"."ONUM" IS NOT NULL AND "N"."NUMS"="R"."ONUM" OR "R"."PNUM" IS NOT
NULL AND "N"."NUMS"="R"."PNUM")
17 rows selected.
现在,如果我们使用 coalesce 函数创建一个基于函数的索引,oracle 会检测到该索引存在,并将使用它来提高连接性能,而无需进行全表扫描(在新创建的IDX_COALESCE_ONUM_PNUM上进行索引范围扫描):
FSITJA@db01 2019-07-18 09:21:33> create index idx_coalesce_onum_pnum on tblreqs (coalesce(Onum, Pnum));
Index created.
FSITJA@db01 2019-07-18 09:21:33> explain plan for
2 SELECT n.nums,
3 r.ReqNum
4 FROM tblnums n
5 LEFT JOIN tblReqs r ON n.nums = COALESCE(r.Onum, r.Pnum);
Explained.
FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 605824869
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 160 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 10 | 160 | 2 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | SYS_C0047401 | 10 | 50 | 1 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TBLREQS | 1 | 11 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_COALESCE_ONUM_PNUM | 1 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("N"."NUMS"="R"."SYS_NC00004$"(+))
16 rows selected.
通常,函数和or
会阻碍优化。两者应具有非常相似的性能。
我认为最好的方法可能是两个连接:
SELECT n.nums,
COALESCE(ro.ReqNum, rp.ReqNum) as reqNum
FROM tblnums n LEFT JOIN
tblReqs ro
ON n.nums = ro.Onum LEFT JOIN
tblReqs rp
ON n.nums = rp.Pnum AND
ro.Onum IS NULL; -- no match the first time