当一列与另一列相似时,优化SQL查询联接



我有两个表:

create table DivisionDocs 
(
RecordID int, 
division varchar(50), 
Market1 varchar(25), 
Market2 varchar(25), 
Market3 varchar(25), 
Market4 varchar(25), 
MarketIsValid varchar(1)
)
insert into DivisionDocs 
values (1, 'Division1', 'A1', 'B1', 'C1', 'D1', NULL)
insert into DivisionDocs 
values (2, 'Division1', 'A2', 'B2', 'C2', 'D2', NULL)
insert into DivisionDocs 
values (3, 'Division1', 'A3', 'B3', 'C3', 'D3', NULL)
insert into DivisionDocs 
values (4, 'Division1', 'A5', 'B3', 'C3', 'D3', NULL)
insert into DivisionDocs 
values (5, 'Division1', 'ALL', 'B3', 'C3', 'D3', NULL)
create table DivisionValidation 
(
division varchar(50), 
Market1 varchar(25), 
Market2 varchar(25), 
Market3 varchar(25), 
Market4 varchar(25)
)
insert into DivisionValidation  
values ('Division1', 'A1', 'B1', 'C1', 'D1')
insert into DivisionValidation 
values ('Division1', 'A2', 'B2', 'C2', 'D2')
insert into DivisionValidation 
values ('Division1', 'A3', 'B3', 'C3', 'D3')

我在DivisionDocs中有大约500万行,在DivisionValidation中有500K行。

在SQL Server中,我试图通过比较两个表来验证DivisionDocsDivisionValidation。使用联接查询很容易做到这一点。在这种情况下,RecordID的4&5无效,因为Market1不匹配:

update t1
set MarketIsValid = 'x'
from DivisionDocs t1
join DivisionValidation t2 on t1.division = t2.division
and t1.market1 = t2.market1
and t1.market2 = t2.market2
and t1.market3 = t2.market3
and t1.market4 = t2.market4

然而,有时市场#字段会包含单词ALL,这基本上意味着它应该始终验证。这基本上意味着它覆盖了整个市场。因此,我正在寻找一种将其构建到更新查询中的方法。

到目前为止,我已经尝试过:

update t1
set MarketIsValid = 'x'
from DivisionDocs t1
join Divisionvalidation t2 on t1.division = t2.division
and t2.market1 like case 
when t1.market1 = 'ALL' then '%' else t1.market1 end
and t2.market2 like case when t1.market2 = 'ALL' then '%' else t1.market2 end
and t2.market3 like case when t1.market3 = 'ALL' then '%' else t1.market3 end
and t2.market4 like case when t1.market4 = 'ALL' then '%' else t1.market4 end

我也试过:

update DivisionDocs
set market1 = '%' 
where market1 = 'ALL'
update t1
set MarketIsValid = 'x'
from DivisionDocs t1
join Divisionvalidation t2 on t1.division = t2.division
and t2.market1 like t1.market1
and t2.market2 like t1.market2
and t2.market3 like t1.market3
and t2.market4 like t1.market4

这两种方法都有效,但运行需要8-10分钟。我在所有列上都有一个非聚集索引。只是想看看是否有人想出了一种更聪明的方法来更快、更高效地运行这个查询。

试试这是否有效。如果没有,请更新您的示例数据,以包含一个"ALL"起作用的实例。

UPDATE dd
SET MarketIsValid = 'x'
FROM DivisionDocs dd
JOIN DivisionValidation dv
ON dv.division = dd.division
AND (dv.market1 = dd.market1 OR dv.Market1 = 'ALL')
AND (dv.market2 = dd.market2 OR dv.Market1 = 'ALL')
AND (dv.market3 = dd.market3 OR dv.Market1 = 'ALL')
AND (dv.market4 = dd.market4 OR dv.Market1 = 'ALL')
;

我修改了表别名以获得一些意义。。。t1和t2对我来说毫无意义。我通常使用表格的首字母缩写作为别名。

最新更新