假设相同的数据有 2 个来源。根据源优先级选择记录的最佳方法是什么?例如
Date Value Source
D1 V1 S1
D1 V2 S2
D2 V3 S1
D3 V4 S2
假设 S1 优于 S2,查询应给出:
D1 V1 S1
D2 V3 S1
D3 V4 S2
目标数据库是MS SQL Server。谢谢。
一种方法是从 S1 中选择所有内容,然后只从 S2 中选择不匹配的记录:
select t.*
from t
where t.source = 'S1' or
not exists (select 1 from t t2 where t2.source = 'S2');
SELECT *
FROM
( SELECT *
,ROW_NUMBER() OVER (PARTITION BY [Date] ORDER BY [Source] ASC) rn
FROM TableName
)a
WHERE Rn = 1
考虑到 S1 比 S2 好,S2 比 S3 好等等.....
使用ROW_NUMBER()
函数,您可以获得每个日期值的最佳可用源。
select * from t t1
where not exists (
select 1 from t t2
where t2.Source > t1.Source
and t2.Value = t1.Value
and t2.Date = t1.Date
)
如果 Source 的优先级逻辑更复杂,则可能需要创建自己的自定义比较函数,在这种情况下,查询可能如下所示:
select * from t t1
where not exists (
select 1 from t t2
where customFunc(t2.Source) > customFunc(t1.Source)
and t2.Value = t1.Value
and t2.Date = t1.Date
)
我可以通过使用一个CASE
表达式来解决这个问题,该表达式的顺序反映了来源的重要性顺序:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT Date,
MIN(CASE WHEN Source = 'S1' THEN 1
WHEN Source = 'S2' THEN 2 END) AS SourceRank
FROM yourTable
GROUP BY Date
) t2
ON t1.Date = t2.Date AND
t2.SourceRank = 1
使用CASE
表达式的好处是允许对两个或多个源进行任何排序,而无需假定重要性的顺序。
如果您的 DBMS 支持窗口函数,您可以像这样找到每个日期的最佳优先级:
SELECT DATE,
value,
source
FROM
(SELECT t.*,
row_number() over (partition BY DATE order by source) rn
FROM TABLE t
) t
WHERE rn = 1;
如果不是,并且日期列和源列的组合是唯一的,则可以使用:
SELECT t1.*
FROM TABLE t1
INNER JOIN
( SELECT DATE, MIN(source) source FROM TABLE GROUP BY DATE
) t2
ON t1.date = t2.date
AND t1.source = t2.source;
假设有一个单独的源优先级表
create table #sp(
Source varchar(5) primary key,
Priority int
);
insert #sp(Source,Priority)
values
('S1',2),
('S2',1);
可以这样使用
select top(1) with ties t.*
from myTable t
join #sp p on t.Source = p.Source
order by row_number() over (partition by t.Date order by p.priority);