我有一个庞大的表格,其中包含所有客户的详细信息,例如他们的电话号码和电子邮件地址。该表可以为每个客户有多行,表示他们在某个时候更改了电话号码、电子邮件地址或其他个人资料信息。每一行都有一个day列,表示配置文件更改发生的日期(即当天配置文件的状态(。
我想提取每个客户个人资料中的每个电话号码和电子邮件地址,并添加一个布尔标志(称为"实时"(,指示当前哪个电话号码和邮件与他们的个人资料相关联(即我们在表中为该客户保存的最新记录(。
这是我目前的查询:
SELECT DISTINCT
customer_id,
phone_number,
email AS email_address,
CASE
WHEN day = (
SELECT
MAX(temp.day)
FROM
customer AS temp
WHERE
customer.customer_id = temp.customer_id
)
THEN true
ELSE false
END AS live
FROM
customer
考虑到数百万不同客户有数十亿行,我认为这是非常低效的。我如何改进这个查询以实现我想要的(也许是窗口函数?(,或者用完全不同的方式更好地实现我想要?
您可以使用row_number()
设置标志:
SELECT
customer_id,
phone_number,
email AS email_address,
(ROW_NUMBER() over(partition by customer_id ORDER BY day DESC) = 1) as is_live
FROM customer
我不确定Presto是否将条件理解为布尔值——如果不是,你可以这样做:
CASE WHEN ROW_NUMBER() over(partition by customer_id ORDER BY day DESC) = 1
THEN true
ELSE false
END as is_live
如果记录是唯一的,您可以使用:
SELECT customer_id, phone_number, email AS email_address,
(CASE WHEN RANK() OVER (PARTITION BY customer_id ORDER BY day DESC) = 1
THEN true ELSE false
END) as is_live
FROM customer c;
Presto支持布尔,因此不需要CASE
表达式。但我保留了它,因为你的代码中有它。
这是您最初的相关子查询,被翻译成Group Max,结果与RANK版本相同:
CASE
WHEN day = MAX(temp.day) over (partition by customer_id)
THEN true
ELSE false
END AS live
要显示每个客户在其个人资料中拥有的每个电话号码和电子邮件地址,意思是这两种资源的每个组合只显示一次请使用以下查询。
这里是样品日期
create table tab as
select 1 customer_id, 'A' phone_number, '1@1' email, DATE'2020-01-01' day from dual union all
select 1 customer_id, 'B' phone_number, '1@1' email, DATE'2020-02-01' day from dual union all
select 1 customer_id, 'A' phone_number, '1@1' email, DATE'2020-03-01' day from dual union all
select 2 customer_id, 'C' phone_number, '1@1' email, DATE'2020-01-01' day from dual union all
select 2 customer_id, 'C' phone_number, '1@1' email, DATE'2020-04-01' day from dual
;
查询
with live as (
select CUSTOMER_ID, PHONE_NUMBER, EMAIL, DAY ,
case when row_number() over (partition by CUSTOMER_ID order by day DESC) = 1 then 1 else 0 end is_live
from tab)
select CUSTOMER_ID, PHONE_NUMBER, EMAIL,
case when max(is_live) = 1 then 'true' else 'false' end is_life
from live
group by CUSTOMER_ID, PHONE_NUMBER, EMAIL
产生
CUSTOMER_ID P EMA IS_LI
----------- - --- -----
1 B 1@x false
1 A 1@x true
2 C 2@x true
子查询标识最后一条(活动(记录,GROUP BY
查询只生成资源的唯一组合(与解决方案中的SELECT DISTINCT
类似,但更好——如果相同的资源位于活动行和非活动行中,则会失败(。