我有这个表(表_2(,它基本上提供了患者信息和他们在特定日期访问的科室-
Patient_ID | 部门 | 服务日期出生日期 | 计数 |
---|---|---|---|
1234 | 心脏 | 1/21>6/18/78 | 5 |
1234 | 心脏 | 8/20/21>6/18/78 | 5 |
1234 | 心脏 | 10/28/21>6/18/78 | 5 |
1234 | 正交 | 6/18/78 | <2>|
1234 | 正交 | 7/21>6/18/78 | 2 |
1234 | 心脏 | 8/20/21>6/18/78 | 5 |
1234 | 心脏 | 4/19/21>6/18/78 | 5 |
1234 | 妇科 | 4/1/21>6/18/78 | 1 |
78645 | 神经 | 5/1/217/18/87 | <2>//tr>|
78645 | 神经 | 7/7/217/18/87 | 2|
78645 | ent | 7/217/18/87 | |
32423 | 胃 | 11/7/21 | 3/12/57<1>
在第二级聚合中也使用FIRST_VALUE()
窗口函数:
SELECT *, FIRST_VALUE(department) OVER (PARTITION BY patient_id ORDER BY count DESC) AS max
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY patient_id, department) AS count
FROM table_1
) t
请参阅演示
count(department) OVER (PARTITION BY patient_id, department)
将为结果中的每一行计算,只需计算同一患者的记录数(具体地说,科室不为空(&部门作为这一行。这不是你想要的。
还将为结果中的每一行计算max(department) OVER (PARTITION BY patient_id) AS max
,只需找到与此行相同的患者的最大科室名称。这是该患者访问的所有科室按字母顺序排序的最后一个科室(这就是将MAX((函数应用于字符值的方式(,但这不是您想要的。
我们想要一个子查询来返回每个患者访问次数最多的科室,这里有一个例子:
select
tbl1.*
, MostVisited.department as MostVisitedDepartment
from
table_1 tbl1
inner join
(-- Take only the most visited department for each patient
select *
from (-- rank departments by number of times
-- visited for each patient; highest visited gets 1
-- the result of a tie (the patient visited two+ departments the
-- same number of times) depends on how the rdbms executes it.
select
VisitCnt.*
, row_number() over (partition by VisitCnt.patient_id
order by VisitCnt.TimesVisited desc) as Seq
from (-- Count each department visited by each patient
select
patient_id, department, count(*) as TimesVisited
from table_1
group by patient_id, department
) VisitCnt
) VisitCntRnked
where VisitCntRnked.Seq=1
) MostVisited
on MostVisited.Patient_ID=tbl1.Patient_ID
至于"最优":这取决于你的意思:表大小、索引、表上的统计信息、你的特定数据库产品,以及,如果你想对表的整个(或大部分(总体运行此查询,还是只对一小部分子集运行此查询。我们通常希望查询优化器在执行查询时考虑到这些因素。有很多例外,但优化查询需要比这个问题中可用的信息多得多的信息。