有连接的条件语句



我有两个数据集,df1和df2,我想把它们连接起来,然后应用一个条件,如果主机列中有多个重复的行,只取那一行(避免重复)。我将加入df1和df2 ON df1。version = df1. name AND ON df1. nameDate = df2.date

条件:purpose should = 'hi'或purpose should = 'cat'

df1

version host    date                                      
pat     a16     12/1/2019   
fam     a16     12/1/2019   
emp     a16     12/1/2019   
dan     a16     12/1/2019 

df2

name    purpose  date
pat     hi       12/1/2019
fam     cat      12/1/2019
hello   dog      12/1/2019
dan     bird     12/1/2019

下面是连接结果:

version host    date                 name    purpose       date                      
pat     a16     12/1/2019            pat     hi            12/1/2019
fam     a16     12/1/2019            fam     cat           12/1/2019

version host    date                 name    purpose       date                      
pat     a16     12/1/2019            pat     hi            12/1/2019

select    df1.version, df1.host, df1.date, df2.name, df2.purpose, df2.date
from      df1
left join df2
on        df1.version = df2.name  AND
df1.date    = df2.date
where     df2.purpose = 'hi' OR df2.purpose = 'cat'

我想我必须在SQL中实现IF THEN语句。上面的语句只执行连接,但没有删除连续重复的主机行。欢迎提出任何建议

您可以应用ROW_NUMBER()函数并对行进行排序以删除重复的行。但是,当您按日期排序时,目的行可以是hicat。如果您想首先选择其中一个,则必须相应地应用ORDER BY。

;with cte_df as
(
select df1.version, df1.host,df1.date, df2.name, df2.purpose, df2.date as df2date
,row_number() over(partition by df1.host order by df1.date) as rnk
FROM df1
inner join df2 
on df1.version = df2.name and df1.date = df2.date
where df2.purpose in ('hi','cat')
)
SELECT * from cte_df WHERE rnk =1

Without_CTE,使用派生表

SELECT * 
FROM
(
select df1.version, df1.host,df1.date, df2.name, df2.purpose, df2.date as df2date
,row_number() over(partition by df1.host order by df1.date) as rnk
FROM df1
inner join df2 
on df1.version = df2.name and df1.date = df2.date
where df2.purpose in ('hi','cat')
) AS T
WHERE rnk = 1

最新更新