我在Postgres中有一个表,它有一个类似于的hstore列
"surface => concrete, ramp => yes, incline => up, highway => footway"
"surface => asphalt, lit => no, source => survey, incline => 12.6%, highway => footway"
"bicycle => yes, surface => asphalt, highway => footway, segregated => no"
现在我想分析一下它们之间的关键。因此,我想知道整个数据集的密钥成对组合的频率。
结果应该是这样的:
| Key 1 | Key 2 | Count |
|---------------------|------------------|------------------|
| surface | source | 1 |
| surface | highway | 3 |
| surface | incline | 2 |
| highway | bicycle | 1 |
.....
使用函数akeys()
将hstore
列的所有键作为一个数组,使用generate_subscripts()
生成所有不同的数组元素对,并将结果按对分组:
select akeys[i] as key1, akeys[j] as key2, count(*)
from my_table
cross join akeys(hstore_col)
cross join generate_subscripts(akeys, 1) as i
cross join generate_subscripts(akeys, 1) as j
where akeys[i] > akeys[j]
group by 1, 2
order by 1 desc, 2 desc
在db<gt;fiddle。
使用hstore的each()
将其展开,然后连接并聚合:
with expand as (
select id, k, v
from htab
cross join lateral each(hcol) as c(k, v)
), pairings as (
select t1.id, t1.k as key1, t2.k as key2
from expand t1
join expand t2
on t2.id = t1.id
and t2.k < t1.k
)
select key1, key2, count(*)
from pairings
group by key1, key2
order by key1 desc, key2 desc;
小提琴