sql中按范围划分的集群编号



我的数据如下所示:

Mike   5
Mike   100
Mike   101
Mike   106
Mike   95
Mike   1000
Mike   1001
Mike   1010
Jen    2006
Jen    2001
Jen    2010
Jen    3000
Jen    10

我想用20的绝对值对数字进行聚类,并在每个聚类中留下最小的一个。

结果如下:

Mike   5
Mike   95
Mike   1000
Jen    2006
Jen    3000
Jen    10

有办法做到这一点吗?关于CCD_ 2,但是如果集群跨越间隔则没有意义,例如,如果我将范围设置为1-20, 21-40, 41-60但如果我的数据有:

Mike   35
Mike   39
Mike   41
Mike   45

它将被分成两个集群

Mike   35
Mike   41

我想要什么:

Mike   35

谢谢!

如果我理解正确,您需要;最小的";每个名称的值以"0"开头;集群";。该集群依次包含值为20的相同名称的所有行。然后对其余集群重复此操作。

这表明了一个递归CTE:

with recursive tn as (
select t.*, row_number() over (partition by name order by val) as seqnum
from t
),
cte as (
select name, val, seqnum, val as cluster_val, 1 as cluster_num
from tn
where seqnum = 1
union all
select cte.name, tn.val, tn.seqnum,
(case when tn.val < cte.cluster_val + 20 then cte.cluster_val else tn.val end) as cluster_val,
(case when tn.val < cte.cluster_val + 20 then cte.cluster_num + 1 else 1 end) as cluster_num
from cte join
tn
on tn.name = cte.name and tn.seqnum = cte.seqnum + 1
)
select *
from cte
where cluster_num = 1
order by name, val;

这里有一个db<gt;不停摆弄

最新更新