在 JOIN 中使用 CASE 而不是在 SELECT 查询语句中使用 CASE 进行更快的 SQL 查询?



我有一个社区成员的视图,其中每个成员都有一个ID的主键。有些人还拥有来自另一个系统的旧身份证,有些人拥有配偶身份证。所有 ID 都是唯一的。

例如:

ID | Name         | OldID   | SpouseID  | SpouseName
1  | John.Smith   | o71     | s99       | Jenna.Smith
2  | Jane.Doe     | o72     |           | 
3  | Jessie.Jones |         |       

我还查看了活动日期,其中每个社区成员可以有多个活动日期。旧身份证和配偶身份证有活动日期。(不幸的是,我无法通过将旧ID转换为新ID来清理数据)

例如:

ID  | ActivityDate | ActiviyType | ActivityGroup
1   | 2017-12-31   | 1           | 1
1   | 2017-12-31   | 3           | 2
1   | 2017-12-31   | 7           | 1
2   | 2017-12-31   | 1           | 1
3   | 2017-12-31   | 1           | 1
o72 | 2010-12-31   | 1           | 2
o72 | 2010-12-31   | 3           | 1
s99 | 2017-12-31   | 1           | 1
s99 | 2017-12-31   | 2           | 1

我可以使用以下方法以我需要的方式选择数据,让多个案例选择运行 3 次以检查 3 个可能的 ID,尽管它非常慢,因为它每条记录多次运行选择查询:

SELECT 
C.ID, 
C.Name,
C.OldID,
C.SpouseID,
C.SpouseName,
CASE 
WHEN C.ID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.OldID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.SpouseID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
THEN 'Yes' 
ELSE '' 
END AS Result i.e. HasTheCommunityMemberOrTheirSpouseOnlyEverAttendedActivityTypeAndGroup1After2016?

所以我期待以下结果,我得到的,只是很慢:

ID | Name         | OldID   | SpouseID  | SpouseName   | Result
1  | John.Smith   | o71     | s99       | Jenna.Smith  | 
2  | Jane.Doe     | o72     |           |              | Yes
3  | Jessie.Jones |         |           |              | Yes

我很欣赏有更好的方法来做到这一点,我很高兴听到一些建议,尽管我在更改这个系统方面的灵活性有限,所以我所问的只是我怎样才能让它更快?理想情况下,我想使用表的连接并使用条件,尽管我无法解决。例如

SELECT 
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
R.Result
FROM 
CommunityMembers C
JOIN 
CASE WHEN Date ... Type ... Group ... ELSE ... IN ... Not Exist ... THEN ... ActivityDates R

SELECT 
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
CASE 
WHEN R.Date ... R.Type ... R.Group ... ELSE ... THEN 'Yes' END AS Result
FROM 
CommunityMembers C
JOIN 
ActivityDates R

我怀疑我需要进行多个连接,尽管我不知道如何编写它。

谢谢

索引就像这样:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

有关更多详细信息,请参阅此链接

您希望每个 ID 的表ActivityDates信息。因此,按 ID 分组并过滤所需的 IDHAVING

SELECT ID 
FROM ActivityDates
WHERE ActivityDate > '2016-12-31'
GROUP BY ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0

您可以将其与EXISTS子句一起使用:

select
c.*, 
case when exists 
(
SELECT a.ID 
FROM ActivityDates a
WHERE a.ActivityDate > '2016-12-31'
AND a.ID in (c.id, c.oldid, c.spouseid)
GROUP BY a.ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
) then 'Yes' else '' end as result
from c;

加快速度的适当索引可能是

create index idx1 on ActivityDates (ID, ActivityDate, ActivityType, ActivityGroup);
create index idx2 on ActivityDates (ActivityDate, ID, ActivityType, ActivityGroup);

找出其中一个是否被使用并放弃另一个(或者两者都不被使用)。

使用不相关的子查询(这意味着我们必须多次访问它)可能会表现得更好。这取决于优化器是否甚至有不同的执行计划:

with good_ids as
(
select id 
from activitydates
where activitydate > '2016-12-31'
group by id
having count(case when activiytype = 1 and activiygroup = 1 then 1 end) > 1
and count(case when activiytype > 1 and activiygroup > 1 then 1 end) = 0
)
select
c.*,
case when id       in (select id from good_ids)
or oldid    in (select id from good_ids)
or spouseid in (select id from good_ids)
then 'Yes' else ''
end as result
from c;

您应该尝试解释输出。很难找到正确的业务。错误查询的规则。

这样,您可以从此处获得最佳查询。只需尝试再次解释为什么id 2,3是肯定的。然后我将重写我的查询。

你即将犯的第二大错误是,如果不了解你的业务规则,没有编写正确的查询,你将创建索引

试试这个,

declare @t table(ID varchar(20),Name varchar(40),OldID varchar(20), SpouseID  varchar(20)
, SpouseName varchar(40))
insert into @t VALUES
('1','John.Smith','o71' ,'s99','Jenna.Smith')
,('2','Jane.Doe' ,'o72',null,null)
,('3','Jessie.Jones',null,null,null)       
--select * from @t
declare @ActivityDates table(ID varchar(20), ActivityDate date
, ActiviyType int, ActivityGroup int)
insert into @ActivityDates VALUES
('1','2017-12-31',1, 1)
,('1','2017-12-31',3, 2)
,('1','2017-12-31',7, 1)
,('2','2017-12-31',1, 1)
,('3','2017-12-31',1, 1)
,('o72','2010-12-31',1, 2)
,('o72','2010-12-31',3, 1)
,('s99','2017-12-31',1, 1)
,('s99','2017-12-31',2, 1)
SELECT t.*
,case when tbl.id is not null then 'Yes' else null end Remarks
from @t t
left JOIN
(select * from @ActivityDates AD
WHERE(( ActivityDate > '2016-12-31' AND ActiviyType = 1 AND ActivityGroup = 1
AND NOT EXISTS (SELECT ID FROM @ActivityDates ad1 WHERE (ad.id=ad1.id) AND
ActivityDate > '2016-12-31' AND (ActiviyType > 1 or ActivityGroup > 1))
)
))tbl
on t.ID=tbl.ID

这是使用"可选联接"的另一种模式,这些模式可能会或可能不会表现得更好。它与您的输出不完全相同 - 我不确定您在那里追求什么。

SELECT A.*,
COALESCE(C1.Name, C2.Name, C3.Name) As Name
FROM  ActivityDates  A
LEFT OUTER JOIN CommunityMember As C1
ON C1.ID = A.ID
LEFT OUTER JOIN CommunityMember As C2
ON C2.OldID = CAST(A.ID AS VARCHAR(12))
LEFT OUTER JOIN CommunityMember As C3
ON C2.SpouseID = CAST(A.ID AS VARCHAR(12))

在某些情况下,这会"重复计算",但如果您确定整个 id 集合是唯一的,您应该没问题。如果您只想知道是否存在活动记录,您绝对可以使用exists来加快速度,但我再次不遵循您的逻辑。

最新更新