目前有一个"访问"数据集,包含日期、患者、提供者、预约级别(该患者的预约序号)和预约/提供者级别(该患者/提供者的预约序号)
需要使用它在每个月的第一天(2020年1月1日开始)进行回顾,对于每个患者,查看之前的7次就诊(没有距离限制),并获取该月与该患者关联的就诊次数最多的提供者。
如果患者就诊少于7次,仍然适用此逻辑。如果患者的就诊次数为偶数,并且导致这些提供者的数量相等,那么获胜者将是最近的一个。患者在该月期间可能没有任何预约。
例如病人A的基本数据是这样的。
ApptDate | Patient | Provider | ApptRank | ApptProvRank |
1/15/2020| A | BOB | 1 | 1 |
3/01/2020| A | BOB | 2 | 2 |
3/08/2020| A | BOB | 3 | 3 |
3/20/2020| A | BOB | 4 | 4 |
4/01/2020| A | BOB | 5 | 5 |
4/15/2020| A | SMITH | 6 | 1 |
6/07/2020| A | SMITH | 7 | 2 |
6/21/2020| A | BOB | 8 | 6 |
7/01/2020| A | JANE | 9 | 1 |
7/15/2020| A | JANE | 10 | 2 |
7/21/2020| A | JANE | 11 | 3 |
8/01/2020| A | JANE | 12 | 4 |
8/15/2020| A | JANE | 13 | 5 |
9/20/2020| A | JANE | 14 | 6 |
然后,我有一个流程来填充缺失的月份,并计算当月每个提供商的应用程序数量,看起来像这样…
Month | Patient | Provider | Appts |Cumulative Appt Rank | Cumulative Appt Provider Rank
1/1/2020 | A | BOB | 1 | 1 | 1
2/1/2020 | A | NULL | NULL | NULL | NULL
3/1/2020 | A | BOB | 3 | 4 | 4
4/1/2020 | A | BOB | 1 | 5 | 5
4/1/2020 | A | SMITH | 1 | 6 | 1
5/1/2020 | A | NULL | NULL | NULL | NULL
6/1/2020 | A | SMITH | 1 | 7 | 2
6/1/2020 | A | BOB | 1 | 8 | 6
7/1/2020 | A | JANE | 3 | 11 | 3
8/1/2020 | A | JANE | 2 | 13 | 5
9/1/2020 | A | JANE | 2 | 14 | 6
我想要的是看起来像这样的东西:(一个月一个供应商和应用程序的份额在之前的7 -不应该高于7)
Month | Patient | Provider | Appt Count (of Past 7)
2/1/2020 | A | BOB | 1 | ----> starting 2/1/2020 for the 'lookback'
3/1/2020 | A | BOB | 1 |
4/1/2020 | A | BOB | 4 |
5/1/2020 | A | BOB | 5 |
6/1/2020 | A | BOB | 5 |
7/1/2020 | A | BOB | 5 |
8/1/2020 | A | JANE | 3 |
9/1/2020 | A | JANE | 5 |
10/1/2020| A | JANE | 6 |
这将涉及多个患者,但为了简化本例,它只有一个。非常感谢对这个问题的任何帮助
将示例数据转换为DDL/DML后:
DECLARE @Visits TABLE (ID INT IDENTITY, ApptDate DATETIME, Patient NVARCHAR(10), Provider NVARCHAR(10), ApptRank INT, ApptProvRank INT);
INSERT INTO @Visits (ApptDate, Patient, Provider, ApptRank, ApptProvRank) VALUES
('1/15/2020', 'A', 'BOB ', 1 , 1),
('3/01/2020', 'A', 'BOB ', 2 , 2),
('3/08/2020', 'A', 'BOB ', 3 , 3),
('3/20/2020', 'A', 'BOB ', 4 , 4),
('4/01/2020', 'A', 'BOB ', 5 , 5),
('4/15/2020', 'A', 'SMITH ', 6 , 1),
('6/07/2020', 'A', 'SMITH ', 7 , 2),
('6/21/2020', 'A', 'BOB ', 8 , 6),
('7/01/2020', 'A', 'JANE ', 9 , 1),
('7/15/2020', 'A', 'JANE ', 10, 2),
('7/21/2020', 'A', 'JANE ', 11, 3),
('8/01/2020', 'A', 'JANE ', 12, 4),
('8/15/2020', 'A', 'JANE ', 13, 5),
('9/20/2020', 'A', 'JANE ', 14, 6),
('1/15/2021', 'B', 'BOB ', 1 , 1),
('3/01/2021', 'B', 'BOB ', 2 , 2),
('3/08/2021', 'B', 'BOB ', 3 , 3),
('3/20/2021', 'B', 'BOB ', 4 , 4),
('4/01/2021', 'B', 'BOB ', 5 , 5),
('4/15/2021', 'B', 'SMITH ', 6 , 1),
('6/07/2021', 'B', 'SMITH ', 7 , 2),
('6/21/2021', 'B', 'JANE ', 8 , 6),
('7/01/2021', 'B', 'JANE ', 9 , 1),
('7/15/2021', 'B', 'JANE ', 10, 2),
('7/21/2021', 'B', 'JANE ', 11, 3),
('8/01/2021', 'B', 'JANE ', 12, 4),
('8/15/2021', 'B', 'JANE ', 13, 5),
('9/20/2021', 'B', 'JANE ', 14, 6);
我们可以通过窗口函数查看这些数据,以确定患者在过去7次就诊(无论持续时间如何)中最受欢迎的提供者,然后连接回它以查看每次预约的7次就诊窗口中最受欢迎的提供者。
;WITH top7VisitProvider AS (
SELECT patient, a.ApptRank, a.v7Provider, a.ProviderLast7Cnt, ROW_NUMBER() OVER (PARTITION BY a.Patient, a.ApptRank ORDER BY a.ProviderLast7Cnt DESC) AS rn
FROM (
SELECT v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, vr.Provider AS v7Provider, COUNT(*) AS ProviderLast7Cnt
FROM @Visits v
LEFT OUTER JOIN @Visits vr
ON v.Patient = vr.Patient
AND vr.ApptRank BETWEEN v.ApptRank-7 AND v.ApptRank
GROUP BY v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, vr.Provider
) a
)
SELECT v.ID, v.ApptDate, v.Patient, v.Provider, v.ApptRank, v.ApptProvRank, t7vp.v7Provider
FROM @Visits v
INNER JOIN top7VisitProvider t7vp
ON v.Patient = t7vp.Patient
AND v.ApptRank = t7vp.ApptRank
AND t7vp.rn = 1
ID ApptDate Patient Provider ApptRank ApptProvRank v7Provider
--------------------------------------------------------------------------------------
1 2020-01-15 00:00:00.000 A BOB 1 1 BOB
2 2020-03-01 00:00:00.000 A BOB 2 2 BOB
3 2020-03-08 00:00:00.000 A BOB 3 3 BOB
4 2020-03-20 00:00:00.000 A BOB 4 4 BOB
5 2020-04-01 00:00:00.000 A BOB 5 5 BOB
6 2020-04-15 00:00:00.000 A SMITH 6 1 BOB
7 2020-06-07 00:00:00.000 A SMITH 7 2 BOB
8 2020-06-21 00:00:00.000 A BOB 8 6 BOB
9 2020-07-01 00:00:00.000 A JANE 9 1 BOB
10 2020-07-15 00:00:00.000 A JANE 10 2 BOB
11 2020-07-21 00:00:00.000 A JANE 11 3 BOB
12 2020-08-01 00:00:00.000 A JANE 12 4 JANE
13 2020-08-15 00:00:00.000 A JANE 13 5 JANE
14 2020-09-20 00:00:00.000 A JANE 14 6 JANE
15 2021-01-15 00:00:00.000 B BOB 1 1 BOB
16 2021-03-01 00:00:00.000 B BOB 2 2 BOB
17 2021-03-08 00:00:00.000 B BOB 3 3 BOB
18 2021-03-20 00:00:00.000 B BOB 4 4 BOB
19 2021-04-01 00:00:00.000 B BOB 5 5 BOB
20 2021-04-15 00:00:00.000 B SMITH 6 1 BOB
21 2021-06-07 00:00:00.000 B SMITH 7 2 BOB
22 2021-06-21 00:00:00.000 B JANE 8 6 BOB
23 2021-07-01 00:00:00.000 B JANE 9 1 BOB
24 2021-07-15 00:00:00.000 B JANE 10 2 BOB
25 2021-07-21 00:00:00.000 B JANE 11 3 JANE
26 2021-08-01 00:00:00.000 B JANE 12 4 JANE
27 2021-08-15 00:00:00.000 B JANE 13 5 JANE
28 2021-09-20 00:00:00.000 B JANE 14 6 JANE
让我们从创建一些示例表开始。我选择创建一个包含"引用年月"的表,另外还有一列表示最古老的历史日期。这个历史列不是必需的,但是我把它放在这里是为了简化查询。
DROP TABLE IF EXISTS bu_datalake.ReferenceDate;
CREATE table bu_datalake.ReferenceDate
(
MonthlyReferenceDate Date,
MaxMonthlyReferenceDateHistory Date
);
INSERT
INTO bu_datalake.ReferenceDate
VALUES ('2022-01-01',ADD_MONTHS('2022-01-01',-7)),
('2022-02-01',ADD_MONTHS('2022-02-01',-7)),
('2022-03-01',ADD_MONTHS('2022-03-01',-7)),
('2022-04-01',ADD_MONTHS('2022-04-01',-7)),
('2022-05-01',ADD_MONTHS('2022-05-01',-7)),
('2022-06-01',ADD_MONTHS('2022-06-01',-7)),
('2022-07-01',ADD_MONTHS('2022-07-01',-7)),
('2022-08-01',ADD_MONTHS('2022-08-01',-7)),
('2022-09-01',ADD_MONTHS('2022-09-01',-7)),
('2022-10-01',ADD_MONTHS('2022-10-01',-7)),
('2022-11-01',ADD_MONTHS('2022-11-01',-7)),
('2022-12-01',ADD_MONTHS('2022-12-01',-7));
DROP TABLE IF EXISTS bu_datalake.PatientProvider;
CREATE table bu_datalake.PatientProvider
(
ApptDate Date,
Patient String,
Provider String
);
INSERT
INTO bu_datalake.PatientProvider
VALUES
('2022-01-15','A','BOB'),
('2022-03-01','A','BOB'),
('2022-03-08','A','BOB'),
('2022-03-20','A','BOB'),
('2022-04-01','A','BOB'),
('2022-04-15','A','SMITH'),
('2022-06-07','A','SMITH'),
('2022-06-21','A','BOB'),
('2022-07-01','A','JANE'),
('2022-07-15','A','JANE'),
('2022-07-21','A','JANE'),
('2022-08-01','A','JANE'),
('2022-08-15','A','JANE'),
('2022-09-20','A','JANE')
下面是创建的表的内容:
SELECT * FROM bu_datalake.ReferenceDate;
MonthlyReferenceDate MaxMonthlyReferenceDateHistory
----------------------------------------------------
2022-01-01 2021-06-01
2022-02-01 2021-07-01
2022-03-01 2021-08-01
2022-04-01 2021-09-01
2022-05-01 2021-10-01
2022-06-01 2021-11-01
2022-07-01 2021-12-01
2022-08-01 2022-01-01
2022-09-01 2022-02-01
2022-10-01 2022-03-01
2022-11-01 2022-04-01
2022-12-01 2022-05-01
SELECT * FROM bu_datalake.PatientProvider;
ApptDate Patient Provider
-----------------------------------------------------
2022-01-15 A BOB
2022-03-01 A BOB
2022-03-08 A BOB
2022-03-20 A BOB
2022-04-01 A BOB
2022-04-15 A SMITH
2022-06-07 A SMITH
2022-06-21 A BOB
2022-07-01 A JANE
2022-07-15 A JANE
2022-07-21 A JANE
2022-08-01 A JANE
2022-08-15 A JANE
2022-09-20 A JANE
让我们按参考月合并考勤,按患者和提供者分组。我们正在创建一个列name MostRecentApptDate因为我们可能需要它有相同数量的患者-提供者出席月:
SELECT TRUNC(ApptDate,'MM') As ApptMonthlyDate,
Patient,
Provider,
MAX(ApptDate) MostRecentApptDate,
COUNT(*) PatientProviderOccurences
FROM bu_datalake.PatientProvider
GROUP BY TRUNC(ApptDate,'MM'),Patient,Provider
ORDER BY TRUNC(ApptDate,'MM'),Patient,Provider
ApptMonthlyDate Patient Provider MostRecentApptDate PatientProviderOccurences
2022-01-01 A BOB 2022-01-15 1
2022-03-01 A BOB 2022-03-20 3
2022-04-01 A BOB 2022-04-01 1
2022-04-01 A SMITH 2022-04-15 1
2022-06-01 A BOB 2022-06-21 1
2022-06-01 A SMITH 2022-06-07 1
2022-07-01 A JANE 2022-07-21 3
现在,我要在这个查询和引用月份表之间建立一个连接。这样做的目的是-对于"每个参考月"-我们可以确定哪些记录我必须考虑在我的"排名分类";(下一步)。因为这个查询可以显示重复的记录(因为一个考勤可以在不同的每月排名分类")我将只显示这些结果的一部分(查询应该返回大约61条记录):
SELECT REF.MonthlyReferenceDate,
PAT.ApptMonthlyDate,
REF.MaxMonthlyReferenceDateHistory,
PAT.MostRecentApptDate,
PAT.Patient,
PAT.Provider,
PAT.PatientProviderOccurences
FROM bu_datalake.ReferenceDate REF
JOIN (SELECT TRUNC(ApptDate,'MM') As ApptMonthlyDate,
Patient,
Provider,
MAX(ApptDate) MostRecentApptDate,
COUNT(*) PatientProviderOccurences
FROM bu_datalake.PatientProvider
GROUP BY TRUNC(ApptDate,'MM'),Patient,Provider) PAT
WHERE PAT.ApptMonthlyDate BETWEEN REF.MaxMonthlyReferenceDateHistory AND REF.MonthlyReferenceDate
ORDER BY REF.MonthlyReferenceDate,PAT.Patient,PAT.Provider
MonthlyReferenceDate ApptMonthlyDate MaxMonthlyReferenceDateHistory MostRecentApptDate Patient Provider PatientProviderOccurences
2022-01-01 2022-01-01 2021-06-01 2022-01-15 A BOB 1
2022-02-01 2022-01-01 2021-07-01 2022-01-15 A BOB 1
2022-03-01 2022-03-01 2021-08-01 2022-03-20 A BOB 3
然后,我将在这个查询中添加一个额外的列,用于对结果进行分类/排序,这样我就可以根据每月参考日期确定哪些是出席人数最多的。这里我们将使用一个名为"ROW_NUMBER">的窗口窗口函数。还有许多其他函数,如RANK,DENSE_RANK和我建议你学习这个主题,因为它对解决这样的问题非常有用。同样,我将只提供部分查询结果:
SELECT REP.MonthlyReferenceDate,
REP.Patient,
REP.Provider,
REP.PatientProviderOccurences,
ROW_NUMBER() OVER(PARTITION BY REP.MonthlyReferenceDate ORDER BY REP.PatientProviderOccurences DESC,REP.MostRecentApptDate DESC) AS Rank
FROM (SELECT REF.MonthlyReferenceDate,
PAT.ApptMonthlyDate,
REF.MaxMonthlyReferenceDateHistory,
PAT.MostRecentApptDate,
PAT.Patient,
PAT.Provider,
PAT.PatientProviderOccurences
FROM bu_datalake.ReferenceDate REF
JOIN (SELECT TRUNC(ApptDate,'MM') As ApptMonthlyDate,
Patient,
Provider,
MAX(ApptDate) MostRecentApptDate,
COUNT(*) PatientProviderOccurences
FROM bu_datalake.PatientProvider
GROUP BY TRUNC(ApptDate,'MM'),Patient,Provider) PAT
WHERE PAT.ApptMonthlyDate BETWEEN REF.MaxMonthlyReferenceDateHistory AND REF.MonthlyReferenceDate
ORDER BY REF.MonthlyReferenceDate,PAT.Patient,PAT.Provider) REP;
MonthlyReferenceDate Patient Provider PatientProviderOccurences Rank
2022-01-01 A BOB 1 1
2022-02-01 A BOB 1 1
2022-03-01 A BOB 3 1
最后,我将应用FILTER (Rank=1)只显示我需要的结果:
SELECT MonthlyReferenceDate,
Patient,
Provider,
PatientProviderOccurences
FROM
(
SELECT REP.MonthlyReferenceDate,
REP.Patient,
REP.Provider,
REP.PatientProviderOccurences,
ROW_NUMBER() OVER(PARTITION BY REP.MonthlyReferenceDate ORDER BY REP.PatientProviderOccurences DESC,REP.MostRecentApptDate DESC) AS Rank
FROM (SELECT REF.MonthlyReferenceDate,
PAT.ApptMonthlyDate,
REF.MaxMonthlyReferenceDateHistory,
PAT.MostRecentApptDate,
PAT.Patient,
PAT.Provider,
PAT.PatientProviderOccurences
FROM bu_datalake.ReferenceDate REF
JOIN (SELECT TRUNC(ApptDate,'MM') As ApptMonthlyDate,
Patient,
Provider,
MAX(ApptDate) MostRecentApptDate,
COUNT(*) PatientProviderOccurences
FROM bu_datalake.PatientProvider
GROUP BY TRUNC(ApptDate,'MM'),Patient,Provider) PAT
WHERE PAT.ApptMonthlyDate BETWEEN REF.MaxMonthlyReferenceDateHistory AND REF.MonthlyReferenceDate
ORDER BY REF.MonthlyReferenceDate,PAT.Patient,PAT.Provider) REP
)
WHERE RANK =1
ORDER BY MonthlyReferenceDate
MonthlyReferenceDate Patient Provider PatientProviderOccurences
2022-01-01 A BOB 1
2022-02-01 A BOB 1
2022-03-01 A BOB 3
2022-04-01 A BOB 3
2022-05-01 A BOB 3
2022-06-01 A BOB 3
2022-07-01 A JANE 3
2022-08-01 A JANE 3
2022-09-01 A JANE 3
2022-10-01 A JANE 3
2022-11-01 A JANE 3
2022-12-01 A JANE 3
我希望这些查询可以帮助你解决你的问题。根据您的数据库,可能需要应用不同的函数。