SQL查询以查找最接近的日期(小于或等于)并相应地匹配数据



将Branch1表中的记录与Email1表中记录映射,并在Email1表格中显示访问值,如下所述:

要求:

  1. branch1表中"campaign_name"0001_RegOnly和"day_id"2021年11月7日的"访问量"值应与email1表中的记录匹配,该记录具有相同的campaignname,并且发送日期尽可能接近(小于或等于(branch1中的day_id。因此,电子邮件表中有3条记录具有相同的campaign_name(0001_RegOnly(,日期分别为11月1日、5日和6日。因此,访问量值应插入电子邮件表中的11月5日记录

a(如果电子邮件表中有相同活动名称(0001_RegOnly(的记录,比如11月8日而不是11月6日,则应将该值分配给具有相同活动名称且日期为11月5日的记录。而不是11月8日,因为11月8号大于11月7日

b( 如果电子邮件表中有相同活动名称(0001_RegOnly(的记录,比如说11月7日以及11月1日、5日和6日,那么该值应该分配给具有相同活动名称的记录,该记录的日期为11月7号,因为它是最接近的日期,在这种情况下相等(branch1.day_id为7,email1.send_date也为7(

条件:

  1. branch1.campaign_name=电子邮件1.campaign名称
  2. branch1.day_id>=电子邮件1.send_date

表格:

EMAIL1表格

Branch1表

预期输出

我使用过的SQL查询:

with CTE as
(
select  campaign_name, send_date, lead (send_date) over (order by send_date) as nextdate
from email1
group by campaign_name, send_date
)
select a.campaign_name, a.send_date, sum(b.visits) as visits
from CTE a  left join branch1 b 
on b.day_id>a.send_date and b.day_id<a.nextdate
and a.campaign_name = b.campaign_name
group by a.send_date, a.campaign_name
order by send_date
;

实际输出:SQL查询输出

这将为您提供所需信息。

--get all emails that meet <= the send date 
with all_emails_lt_branch_date as (
select b.campaign_name, 
b.day_id,
b.visits,
e.send_date 
case when e.send_date <= b.day_id then true else false end as possible 
from branch1 b 
left join email1 e  on e.campaign_name = b.campaign_name
where possible = true 
)
--keep only the latest / max email date for each day_id 
select * , 
row_number() over (partition by campaign_name, day_id order by  send_date) as row_number
from all_emails_lt_branch_date
qualify row_number = 1

有两个CTE作为伪数据表

WITH data_email(campaign_name, send_date, visits) AS (
SELECT column1, column2::date, null FROM VALUES
('0001_RegOnly', '2021-11-01'),
('0001_RegOnly', '2021-11-05'),
('0001_RegOnly', '2021-11-06'),
('002_RegOnly', '2021-09-28'),
('002_RegOnly', '2021-10-30'),
('002_RegOnly', '2021-11-01'),
('002_RegOnly', '2021-11-03')
), data_branch(campaign_name, day_id, visits) AS (
SELECT column1, column2::date, column3 FROM VALUES
('0001_RegOnly', '2021-11-04', 5),
('0001_RegOnly', '2021-11-07', 1),
('002_RegOnly', '2021-10-15', 6),
('002_RegOnly', '2021-11-02', 8)
)

然后SQL使用该数据:

WITH email_cte AS (
SELECT e.*, 
lead(e.send_date,1,to_date('2100-01-01')) over (partition by e.campaign_name order by e.send_date) as next_sent
FROM data_email AS e
)
SELECT e.campaign_name, 
e.send_date,
b.visits
FROM  email_cte AS e
LEFT JOIN data_branch AS b
ON b.campaign_name = e.campaign_name 
AND b.day_id between e.send_date and next_sent 
AND b.day_id < e.next_sent
--QUALIFY row_number() over (partition by e.campaign_name, e.send_date order by b.day_id ) = 1
ORDER BY 1,2;

我们得到了想要的结果。

访问[/td>6[/td>[/td>
CAMPAIGN_NAME发送日期
0001_RegOnly2021-11-015
0001_RegOnly2021-11-05
0001_RegOnly2021-11-061
002_RegOnly2021-09-28
002_RegOnly2021-10-30
002_RegOnly
002_RegOnly2021-11-03