将Branch1表中的记录与Email1表中记录映射,并在Email1表格中显示访问值,如下所述:
要求:
- branch1表中"campaign_name"0001_RegOnly和"day_id"2021年11月7日的"访问量"值应与email1表中的记录匹配,该记录具有相同的campaignname,并且发送日期尽可能接近(小于或等于(branch1中的day_id。因此,电子邮件表中有3条记录具有相同的campaign_name(0001_RegOnly(,日期分别为11月1日、5日和6日。因此,访问量值应插入电子邮件表中的11月5日记录
a(如果电子邮件表中有相同活动名称(0001_RegOnly(的记录,比如11月8日而不是11月6日,则应将该值分配给具有相同活动名称且日期为11月5日的记录。而不是11月8日,因为11月8号大于11月7日
b( 如果电子邮件表中有相同活动名称(0001_RegOnly(的记录,比如说11月7日以及11月1日、5日和6日,那么该值应该分配给具有相同活动名称的记录,该记录的日期为11月7号,因为它是最接近的日期,在这种情况下相等(branch1.day_id为7,email1.send_date也为7(
条件:
- branch1.campaign_name=电子邮件1.campaign名称
- branch1.day_id>=电子邮件1.send_date
表格:
EMAIL1表格
Branch1表
预期输出
我使用过的SQL查询:
with CTE as
(
select campaign_name, send_date, lead (send_date) over (order by send_date) as nextdate
from email1
group by campaign_name, send_date
)
select a.campaign_name, a.send_date, sum(b.visits) as visits
from CTE a left join branch1 b
on b.day_id>a.send_date and b.day_id<a.nextdate
and a.campaign_name = b.campaign_name
group by a.send_date, a.campaign_name
order by send_date
;
实际输出:SQL查询输出
这将为您提供所需信息。
--get all emails that meet <= the send date
with all_emails_lt_branch_date as (
select b.campaign_name,
b.day_id,
b.visits,
e.send_date
case when e.send_date <= b.day_id then true else false end as possible
from branch1 b
left join email1 e on e.campaign_name = b.campaign_name
where possible = true
)
--keep only the latest / max email date for each day_id
select * ,
row_number() over (partition by campaign_name, day_id order by send_date) as row_number
from all_emails_lt_branch_date
qualify row_number = 1
有两个CTE作为伪数据表
WITH data_email(campaign_name, send_date, visits) AS (
SELECT column1, column2::date, null FROM VALUES
('0001_RegOnly', '2021-11-01'),
('0001_RegOnly', '2021-11-05'),
('0001_RegOnly', '2021-11-06'),
('002_RegOnly', '2021-09-28'),
('002_RegOnly', '2021-10-30'),
('002_RegOnly', '2021-11-01'),
('002_RegOnly', '2021-11-03')
), data_branch(campaign_name, day_id, visits) AS (
SELECT column1, column2::date, column3 FROM VALUES
('0001_RegOnly', '2021-11-04', 5),
('0001_RegOnly', '2021-11-07', 1),
('002_RegOnly', '2021-10-15', 6),
('002_RegOnly', '2021-11-02', 8)
)
然后SQL使用该数据:
WITH email_cte AS (
SELECT e.*,
lead(e.send_date,1,to_date('2100-01-01')) over (partition by e.campaign_name order by e.send_date) as next_sent
FROM data_email AS e
)
SELECT e.campaign_name,
e.send_date,
b.visits
FROM email_cte AS e
LEFT JOIN data_branch AS b
ON b.campaign_name = e.campaign_name
AND b.day_id between e.send_date and next_sent
AND b.day_id < e.next_sent
--QUALIFY row_number() over (partition by e.campaign_name, e.send_date order by b.day_id ) = 1
ORDER BY 1,2;
我们得到了想要的结果。
CAMPAIGN_NAME | 发送日期 | 访问|
---|---|---|
0001_RegOnly | 2021-11-01 | 5 |
0001_RegOnly | 2021-11-05 | [/td>|
0001_RegOnly | 2021-11-06 | 1 |
002_RegOnly | 2021-09-28 | 6|
002_RegOnly | 2021-10-30 | [/td>|
002_RegOnly | ||
002_RegOnly | 2021-11-03 | [/td>