我的数据如下所示:
Mike 5
Mike 100
Mike 101
Mike 106
Mike 95
Mike 1000
Mike 1001
Mike 1010
Jen 2006
Jen 2001
Jen 2010
Jen 3000
Jen 10
我想用20
的绝对值对数字进行聚类,并在每个聚类中留下最小的一个。
结果如下:
Mike 5
Mike 95
Mike 1000
Jen 2006
Jen 3000
Jen 10
有办法做到这一点吗?关于CCD_ 2,但是如果集群跨越间隔则没有意义,例如,如果我将范围设置为1-20, 21-40, 41-60
但如果我的数据有:
Mike 35
Mike 39
Mike 41
Mike 45
它将被分成两个集群
Mike 35
Mike 41
我想要什么:
Mike 35
谢谢!
如果我理解正确,您需要;最小的";每个名称的值以"0"开头;集群";。该集群依次包含值为20的相同名称的所有行。然后对其余集群重复此操作。
这表明了一个递归CTE:
with recursive tn as (
select t.*, row_number() over (partition by name order by val) as seqnum
from t
),
cte as (
select name, val, seqnum, val as cluster_val, 1 as cluster_num
from tn
where seqnum = 1
union all
select cte.name, tn.val, tn.seqnum,
(case when tn.val < cte.cluster_val + 20 then cte.cluster_val else tn.val end) as cluster_val,
(case when tn.val < cte.cluster_val + 20 then cte.cluster_num + 1 else 1 end) as cluster_num
from cte join
tn
on tn.name = cte.name and tn.seqnum = cte.seqnum + 1
)
select *
from cte
where cluster_num = 1
order by name, val;
这里有一个db<gt;不停摆弄