我正在处理这样的企业数据。
| load_number | id | time | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1442 | 570732257 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1442 | 570732257 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
| 1442 | 570732258 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1442 | 570732258 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
问题是,该公司有不良的数据做法,更改/重用ID,但只更新load_number
字段。
我如何构造sql查询来提取最新加载的数据,如下所示:
| load_number | id | time | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
基本上,除了id
和load_number
之外,每个字段都将匹配。因此,假设除了这两个字段之外,每个字段都匹配,我可以通过取load_number
较高的行来删除"重复项"吗。
我在想CCD_ 5在CCD_,非常感谢您的帮助!
试试这样的
with max_load_numbers_by_id AS (
SELECT et.id, MAX(et.load_number) AS max_load_number
FROM enterprise_table et
GROUP BY et.id
)
SELECT et.*
FROM enterprise_table et
JOIN max_load_numbers_by_id mlnbi
ON et.id = mlnbi.id
AND et.max_load_number = mlnbi.load_number
您只需使用distinct on
:
select distinct on (time, slot_time, region, network) t.*
from mytable t
order by time, slot_time, region, network, load_number desc
DB Fiddle上的演示:
load_number|id|time|slot_time|region|network----------:|--------:|:--------|:--------|:-----|:------1692|641131146|00:20:00|00:20:00 |FX-4|SBOB1692|641131146|00:20:00|00:20:30|FX-4|SBOB1692|641131147|00:55:00|00:55:00 |FX-4|SBOB1692|641131147|00:55:00|00:55:30|FX-4|SBOB
您可以使用窗口函数rank
或dense_rank
来选择最近的load_number。这是演示。
select
load_number,
id,
time,
slot_time,
region,
network
from
(
select
*,
dense_rank() over(order by load_number desc) as rn
from myTable
) subq
where rn = 1;
输出:
| load_number | id | time | slot_time | region | network |
| ----------- | --------- | -------- | --------- | ------ | ------- |
| 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |