给定一个类似的表
Key | Value
----------------+-------------
A | 1
A | 2
B | 1
B | 2
C | 1
C | 2
如何使用SQL
生成所有(keys, value)
的乘积?
这是所需的输出:
Group | Key | Value
---------+--------------+----
1 |A | 1
1 |B | 1
1 |C | 1
2 |A | 1
2 |B | 1
2 |C | 2
3 |A | 1
3 |B | 2
3 |C | 1
4 |A | 1
4 |B | 2
4 |C | 2
5 |A | 2
5 |B | 2
5 |C | 2
6 |A | 2
6 |B | 1
6 |C | 2
7 |A | 2
7 |B | 2
7 |C | 1
8 |A | 2
8 |B | 1
8 |C | 1
为了说明目的,我可以在Python
中使用:
In [1]: from itertools import product
In [2]: list(product([('A',1), ('A', 2)], [('B', 1), ('B', 2)], [('C', 1), ('C', 2)]))
Out[2]:
[(('A', 1), ('B', 1), ('C', 1)),
(('A', 1), ('B', 1), ('C', 2)),
(('A', 1), ('B', 2), ('C', 1)),
(('A', 1), ('B', 2), ('C', 2)),
(('A', 2), ('B', 1), ('C', 1)),
(('A', 2), ('B', 1), ('C', 2)),
(('A', 2), ('B', 2), ('C', 1)),
(('A', 2), ('B', 2), ('C', 2))]
在我的实际应用程序中,(key, value)
对的数量并不是提前固定的。
没有简单的方法可以满足您的要求,但有一种可能的方法:
WITH t(key, value) AS (
VALUES
('A', 1),
('A', 2),
('B', 1),
('B', 2),
('C', 1),
('C', 2)
),
grouped AS (
SELECT
combinations(array_agg(ROW(key, value)), 3) combinations,
count(distinct key) key_count
FROM t
),
combinations AS (
SELECT row_number() OVER () AS "group", combination
FROM grouped CROSS JOIN UNNEST(combinations) AS u(combination)
WHERE cardinality(array_distinct(transform(combination, e -> e[1]))) = key_count
),
unnested AS (
SELECT "group", key, value
FROM combinations, UNNEST(combination) AS u(key, value)
ORDER BY "group", "key"
)
SELECT * FROM unnested
=>
group | key | value
-------+-----+-------
1 | A | 1
1 | B | 1
1 | C | 1
2 | A | 1
2 | B | 2
2 | C | 1
3 | A | 2
3 | B | 1
3 | C | 1
4 | A | 2
4 | B | 2
4 | C | 1
5 | A | 1
5 | B | 1
5 | C | 2
6 | A | 1
6 | B | 2
6 | C | 2
7 | A | 2
7 | B | 1
7 | C | 2
8 | A | 2
8 | B | 2
8 | C | 2
(24 rows)
以下是它的工作原理:
- 创建一个条目数组(即
ROW(key, value)
(并计算不同键的总数
- 使用
combinations
函数从该数组的元素生成所有可能的三向组合。这将产生一些";无效";组合:多次包含同一个键的组合 - 筛选生成的组合以排除任何不包含所有键的组合。使用
row_number
窗口函数为每个剩余组合分配一个组id - 取消组合数组中各个键/值对的编号,以生成所需的输出
您可以使用递归CTE来实现这一点,但将值放在一列中更简单:
with recursive kv as (
select key, value, dense_rank() over (order by key) as seqnum
from t
),
cte(kv, seqnum) as (
select array[row(key, value)] as kv, 1 as seqnum
from kv
where seqnum = 1
union all
select kv || row(kv.key, kv.value), kv.seqnum
from cte join
kv
on kv.seqnum = kv + 1
)
select cte.*, row_number() over (order by seqnum)
from cte;
如果您愿意,可以取消对数组的测试以获得单独的行。
不幸的是,我手头没有Presto/Trino来检查语法。