我有一个复杂的生产SQL问题。它实际上是PrestoDB Hadoop,但符合常见的SQL。
我必须从一个表中获得一堆指标,有点像这样(如果表被破坏了,很抱歉(:
+--------+--------------+------------------+
| device | install_date | customer_account |
+--------+--------------+------------------+
| dev 1 | 1-Jun | 123 |
| dev 1 | 4-Jun | 456 |
| dev 1 | 10-Jun | 789 |
| dev 2 | 20-Jun | 50 |
| dev 2 | 25-Jun | 60 |
+--------+--------------+------------------+
我需要这样的东西:
+--------+------------------+-------------------------+
| device | max_install_date | previous_account_number |
+--------+------------------+-------------------------+
| dev 1 | 10-Jun | 456 |
| dev 2 | 25-Jun | 50 |
+--------+------------------+-------------------------+
我可以做两个单独的查询来获得最大安装日期和以前的帐号,如下所示:
select device, max(install_date) as max_install_date
from (select [a whole bunch of stuff], dense_rank() over(partition by device order by [something_else]) rnk
from some_table a
)
但是,如何将它们组合成一个查询,为每个设备获取一行?我有排名、陈述、案例陈述和一个连接。它们都是单独工作的,但我正在努力了解如何将它们结合起来。
我需要了解如何构建大型查询。
ps。你推荐什么关于高级SQL数据分析的好书?我在亚马逊上看到了很多,但没有告诉我如何构建这样的大型查询。我不是DBA。我是个数据专家。
谢谢。
您可以使用相关subquery
方法:
select t.*
from table t
where install_date = (select max(install_date) from table t1 where t1.device = t.device);
这假设install_date
具有重新销售日期格式。
我想你想要:
select t.*
from (select t.*, max(install_date) over (partition by device) as max_install_date,
lag(customer_account) over (partition by device order by install-date) as prev_customer_account
from t
) t
where install_date = max_install_date;