我有一个职业列表,其中将随时间添加不同版本的职业记录。这意味着将有多个具有相同OCCUPATION_CD
的记录,并且我想过滤我的数据集,以便仅显示MAJOR_VER_DATE
的每个职业的最新版本。
我将查找一个职业的最新版本的唯一OCCUPATION_ID
,这样我就可以查找与该职业版本相关的所有子记录。
职业数据集
OCCUPATION_ID,OCCUPATION_CD,ANZSIC_ID,OCCUPATION_DIVISION_CODE,OCCUPATION_SUBDIVISION_CODE,OCCUPATION_CLASS,OCCUPATION_CD_DSC,MAJOR_VER,MAJOR_VER_DATE,MINOR_VER,MINOR_VER_DATE
1,0219D,A0219,A,"A_Agriculture, Forestry and Fishing",,Agricultural / Farm Fencing Contractors,1,12/06/2022,1,12/06/2022
2,5110C,G5110,G,G_Food Retailing,R,Tobacconists,1,12/06/2022,1,12/06/2022
3,5710D,H5710,H,H_Accomodation & Licensed Clubs,,Motel,1,12/06/2022,1,12/06/2022
4,8440A,N8440,N,N_Education,,Other Education not elsewhere classified (Theoretical training only),1,12/06/2022,1,12/06/2022
5,9523,Q9523,Q,Q_Other,,Photographic Studio,1,12/06/2022,1,12/06/2022
6,9524D,Q9524,Q,Q_Other,,Cemetery Operation,1,12/06/2022,1,12/06/2022
7,9525A,Q9525,Q,Q_Other,,Gardening Services,1,12/06/2022,1,12/06/2022
8,9525C,Q9525,Q,Q_Other,,Swimming Pool Cleaning & Maintenance,1,12/06/2022,1,12/06/2022
9,9529B,Q9529,Q,Q_Other,,Pet Grooming / Washing Service,1,12/06/2022,1,12/06/2022
10,9529H,Q9529,Q,Q_Other,,Chauffeur Service,1,12/06/2022,1,12/06/2022
11,7241,,J,,,Music Publishing,1,12/06/2022,1,12/06/2022
12,0219D,A0219,A,"A_Agriculture, Forestry and Fishing",,Fence Installation,2,11/07/2022,1,11/07/2022
我希望的结果是,如果我做SELECT OCCUPATION_ID FROM [data_set] WHERE OCCUPATION_CD = '0219D'
将解析占用ID12
而不是1
,因为ID为12的记录有一个MAJOR_VER_DATE,它比ID为1的记录更近。
我要做的是尝试过滤data_set我想只显示最新的版本如下。我选择使用独特的OCCUPATION_ID
作为我获得最新版本的方式,而不是MAJOR_VER_DATE
,因为它感觉更可靠。
然后我需要做的是使这个结果成为我在OCCUPATION_CD
上查询以解析OCCUPATION_ID
的表。你会怎么做呢?这就像一个三层的问题,我无法理解。由于
SELECT latest_versions.*
FROM `OCCUPATIONS` latest_versions JOIN
(
SELECT OCCUPATION_CD, MAX(OCCUPATION_ID) as latest_version
FROM OCCUPATIONS
GROUP BY OCCUPATION_CD
) occ_filtered
ON latest_versions.OCCUPATION_CD = occ_filtered.OCCUPATION_CD
AND OCCUPATION_ID = latest_version
with data as (
select *, row_number()
over (partition by occupation_cd order by major_ver_date desc) as rn
from occupations
)
select * from data where rn = 1;
或遵循您的方法:
SELECT * FROM OCCUPATIONS WHERE OCCUPATION_ID IN (
SELECT OCCUPATION_ID
FROM `OCCUPATIONS` latest_versions JOIN (
SELECT OCCUPATION_CD, MAX(MAJOR_VERSION_DATE) as latest_version
FROM OCCUPATIONS
GROUP BY OCCUPATION_CD
) occ_filtered
ON latest_versions.OCCUPATION_CD = occ_filtered.OCCUPATION_CD
AND MAJOR_VERSION_DATE = latest_version
)
SELECT * from
occupations A JOIN (
SELECT occupation_cd, max(major_ver_date) max_date
FROM occupations B
GROUP by occupation_cd
) C on a.ocupation_cd=c.occupation_cd and a.major_ver_date=c.max_date
WHERE A.occupation_id = (
SELECT top 1 occupation_id
FROM occupations D
WHERE D.occupation_cd=A.occupation_cd and D.major_ver_date=A.major_ver_date
ORDER BY 1 DESC
)
特定的sql实现可以允许更简单的查询。