通过加载最新的数据来删除重复的值



我正在处理这样的企业数据。

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

问题是,该公司有不良的数据做法,更改/重用ID,但只更新load_number字段。

我如何构造sql查询来提取最新加载的数据,如下所示:

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

基本上,除了idload_number之外,每个字段都将匹配。因此,假设除了这两个字段之外,每个字段都匹配,我可以通过取load_number较高的行来删除"重复项"吗。

我在想CCD_ 5在CCD_,非常感谢您的帮助!

试试这样的

with max_load_numbers_by_id AS (
SELECT et.id, MAX(et.load_number) AS max_load_number
FROM enterprise_table et
GROUP BY et.id 
)
SELECT et.*
FROM enterprise_table et
JOIN max_load_numbers_by_id mlnbi
ON et.id = mlnbi.id
AND et.max_load_number = mlnbi.load_number

您只需使用distinct on:

select distinct on (time, slot_time, region, network) t.*
from mytable t
order by time, slot_time, region, network, load_number desc

DB Fiddle上的演示

load_number|id|time|slot_time|region|network----------:|--------:|:--------|:--------|:-----|:------1692|641131146|00:20:00|00:20:00 |FX-4|SBOB1692|641131146|00:20:00|00:20:30|FX-4|SBOB1692|641131147|00:55:00|00:55:00 |FX-4|SBOB1692|641131147|00:55:00|00:55:30|FX-4|SBOB

您可以使用窗口函数rankdense_rank来选择最近的load_number。这是演示。

select
load_number, 
id, 
time, 
slot_time, 
region, 
network
from
(
select
*,
dense_rank() over(order by load_number desc) as rn
from myTable
) subq
where rn = 1;

输出:

| load_number | id        | time     | slot_time | region | network |
| ----------- | --------- | -------- | --------- | ------ | ------- |
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

最新更新