我有一个社区成员的视图,其中每个成员都有一个ID的主键。有些人还拥有来自另一个系统的旧身份证,有些人拥有配偶身份证。所有 ID 都是唯一的。
例如:
ID | Name | OldID | SpouseID | SpouseName
1 | John.Smith | o71 | s99 | Jenna.Smith
2 | Jane.Doe | o72 | |
3 | Jessie.Jones | |
我还查看了活动日期,其中每个社区成员可以有多个活动日期。旧身份证和配偶身份证有活动日期。(不幸的是,我无法通过将旧ID转换为新ID来清理数据)
例如:
ID | ActivityDate | ActiviyType | ActivityGroup
1 | 2017-12-31 | 1 | 1
1 | 2017-12-31 | 3 | 2
1 | 2017-12-31 | 7 | 1
2 | 2017-12-31 | 1 | 1
3 | 2017-12-31 | 1 | 1
o72 | 2010-12-31 | 1 | 2
o72 | 2010-12-31 | 3 | 1
s99 | 2017-12-31 | 1 | 1
s99 | 2017-12-31 | 2 | 1
我可以使用以下方法以我需要的方式选择数据,让多个案例选择运行 3 次以检查 3 个可能的 ID,尽管它非常慢,因为它每条记录多次运行选择查询:
SELECT
C.ID,
C.Name,
C.OldID,
C.SpouseID,
C.SpouseName,
CASE
WHEN C.ID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.OldID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
OR C.SpouseID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
THEN 'Yes'
ELSE ''
END AS Result i.e. HasTheCommunityMemberOrTheirSpouseOnlyEverAttendedActivityTypeAndGroup1After2016?
所以我期待以下结果,我得到的,只是很慢:
ID | Name | OldID | SpouseID | SpouseName | Result
1 | John.Smith | o71 | s99 | Jenna.Smith |
2 | Jane.Doe | o72 | | | Yes
3 | Jessie.Jones | | | | Yes
我很欣赏有更好的方法来做到这一点,我很高兴听到一些建议,尽管我在更改这个系统方面的灵活性有限,所以我所问的只是我怎样才能让它更快?理想情况下,我想使用表的连接并使用条件,尽管我无法解决。例如
SELECT
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
R.Result
FROM
CommunityMembers C
JOIN
CASE WHEN Date ... Type ... Group ... ELSE ... IN ... Not Exist ... THEN ... ActivityDates R
或
SELECT
C.ID, C.Name,
C.OldID, C.SpouseID, C.SpouseName,
CASE
WHEN R.Date ... R.Type ... R.Group ... ELSE ... THEN 'Yes' END AS Result
FROM
CommunityMembers C
JOIN
ActivityDates R
我怀疑我需要进行多个连接,尽管我不知道如何编写它。
谢谢
索引就像这样:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
有关更多详细信息,请参阅此链接
您希望每个 ID 的表ActivityDates
信息。因此,按 ID 分组并过滤所需的 IDHAVING
:
SELECT ID
FROM ActivityDates
WHERE ActivityDate > '2016-12-31'
GROUP BY ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
您可以将其与EXISTS
子句一起使用:
select
c.*,
case when exists
(
SELECT a.ID
FROM ActivityDates a
WHERE a.ActivityDate > '2016-12-31'
AND a.ID in (c.id, c.oldid, c.spouseid)
GROUP BY a.ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
) then 'Yes' else '' end as result
from c;
加快速度的适当索引可能是
create index idx1 on ActivityDates (ID, ActivityDate, ActivityType, ActivityGroup);
create index idx2 on ActivityDates (ActivityDate, ID, ActivityType, ActivityGroup);
找出其中一个是否被使用并放弃另一个(或者两者都不被使用)。
使用不相关的子查询(这意味着我们必须多次访问它)可能会表现得更好。这取决于优化器是否甚至有不同的执行计划:
with good_ids as
(
select id
from activitydates
where activitydate > '2016-12-31'
group by id
having count(case when activiytype = 1 and activiygroup = 1 then 1 end) > 1
and count(case when activiytype > 1 and activiygroup > 1 then 1 end) = 0
)
select
c.*,
case when id in (select id from good_ids)
or oldid in (select id from good_ids)
or spouseid in (select id from good_ids)
then 'Yes' else ''
end as result
from c;
您应该尝试解释输出。很难找到正确的业务。错误查询的规则。
这样,您可以从此处获得最佳查询。只需尝试再次解释为什么id 2,3是肯定的。然后我将重写我的查询。
你即将犯的第二大错误是,如果不了解你的业务规则,没有编写正确的查询,你将创建索引
试试这个,
declare @t table(ID varchar(20),Name varchar(40),OldID varchar(20), SpouseID varchar(20)
, SpouseName varchar(40))
insert into @t VALUES
('1','John.Smith','o71' ,'s99','Jenna.Smith')
,('2','Jane.Doe' ,'o72',null,null)
,('3','Jessie.Jones',null,null,null)
--select * from @t
declare @ActivityDates table(ID varchar(20), ActivityDate date
, ActiviyType int, ActivityGroup int)
insert into @ActivityDates VALUES
('1','2017-12-31',1, 1)
,('1','2017-12-31',3, 2)
,('1','2017-12-31',7, 1)
,('2','2017-12-31',1, 1)
,('3','2017-12-31',1, 1)
,('o72','2010-12-31',1, 2)
,('o72','2010-12-31',3, 1)
,('s99','2017-12-31',1, 1)
,('s99','2017-12-31',2, 1)
SELECT t.*
,case when tbl.id is not null then 'Yes' else null end Remarks
from @t t
left JOIN
(select * from @ActivityDates AD
WHERE(( ActivityDate > '2016-12-31' AND ActiviyType = 1 AND ActivityGroup = 1
AND NOT EXISTS (SELECT ID FROM @ActivityDates ad1 WHERE (ad.id=ad1.id) AND
ActivityDate > '2016-12-31' AND (ActiviyType > 1 or ActivityGroup > 1))
)
))tbl
on t.ID=tbl.ID
这是使用"可选联接"的另一种模式,这些模式可能会或可能不会表现得更好。它与您的输出不完全相同 - 我不确定您在那里追求什么。
SELECT A.*,
COALESCE(C1.Name, C2.Name, C3.Name) As Name
FROM ActivityDates A
LEFT OUTER JOIN CommunityMember As C1
ON C1.ID = A.ID
LEFT OUTER JOIN CommunityMember As C2
ON C2.OldID = CAST(A.ID AS VARCHAR(12))
LEFT OUTER JOIN CommunityMember As C3
ON C2.SpouseID = CAST(A.ID AS VARCHAR(12))
在某些情况下,这会"重复计算",但如果您确定整个 id 集合是唯一的,您应该没问题。如果您只想知道是否存在活动记录,您绝对可以使用exists
来加快速度,但我再次不遵循您的逻辑。