prestosql中的笛卡尔乘积无界组数



给定一个类似的表

Key        |   Value
----------------+-------------
A              | 1
A              | 2
B              | 1
B              | 2
C              | 1
C              | 2

如何使用SQL生成所有(keys, value)的乘积?

这是所需的输出:

Group |   Key        |   Value
---------+--------------+----
1       |A             | 1
1       |B             | 1
1       |C             | 1
2       |A             | 1
2       |B             | 1
2       |C             | 2
3       |A             | 1
3       |B             | 2
3       |C             | 1
4       |A             | 1
4       |B             | 2
4       |C             | 2
5       |A             | 2
5       |B             | 2
5       |C             | 2
6       |A             | 2
6       |B             | 1
6       |C             | 2
7       |A             | 2
7       |B             | 2
7       |C             | 1
8       |A             | 2
8       |B             | 1
8       |C             | 1

为了说明目的,我可以在Python中使用:

In [1]: from itertools import product
In [2]: list(product([('A',1), ('A', 2)], [('B', 1), ('B', 2)], [('C', 1), ('C', 2)]))
Out[2]:
[(('A', 1), ('B', 1), ('C', 1)),
(('A', 1), ('B', 1), ('C', 2)),
(('A', 1), ('B', 2), ('C', 1)),
(('A', 1), ('B', 2), ('C', 2)),
(('A', 2), ('B', 1), ('C', 1)),
(('A', 2), ('B', 1), ('C', 2)),
(('A', 2), ('B', 2), ('C', 1)),
(('A', 2), ('B', 2), ('C', 2))]

在我的实际应用程序中,(key, value)对的数量并不是提前固定的。

没有简单的方法可以满足您的要求,但有一种可能的方法:

WITH t(key, value) AS (
VALUES
('A', 1),
('A', 2),
('B', 1),
('B', 2),
('C', 1),
('C', 2)
),
grouped AS (
SELECT
combinations(array_agg(ROW(key, value)), 3) combinations,
count(distinct key) key_count
FROM t
),
combinations AS (
SELECT row_number() OVER () AS "group", combination
FROM grouped CROSS JOIN UNNEST(combinations) AS u(combination)
WHERE cardinality(array_distinct(transform(combination, e -> e[1]))) = key_count
),
unnested AS (
SELECT "group", key, value
FROM combinations, UNNEST(combination) AS u(key, value)
ORDER BY "group", "key"
)
SELECT * FROM unnested

=>

group | key | value
-------+-----+-------
1 | A   |     1
1 | B   |     1
1 | C   |     1
2 | A   |     1
2 | B   |     2
2 | C   |     1
3 | A   |     2
3 | B   |     1
3 | C   |     1
4 | A   |     2
4 | B   |     2
4 | C   |     1
5 | A   |     1
5 | B   |     1
5 | C   |     2
6 | A   |     1
6 | B   |     2
6 | C   |     2
7 | A   |     2
7 | B   |     1
7 | C   |     2
8 | A   |     2
8 | B   |     2
8 | C   |     2
(24 rows)

以下是它的工作原理:

  1. 创建一个条目数组(即ROW(key, value)(并计算不同键的总数
  • 使用combinations函数从该数组的元素生成所有可能的三向组合。这将产生一些";无效";组合:多次包含同一个键的组合
  • 筛选生成的组合以排除任何不包含所有键的组合。使用row_number窗口函数为每个剩余组合分配一个组id
  • 取消组合数组中各个键/值对的编号,以生成所需的输出

您可以使用递归CTE来实现这一点,但将值放在一列中更简单:

with recursive kv as (
select key, value, dense_rank() over (order by key) as seqnum
from t
),
cte(kv, seqnum) as (
select array[row(key, value)] as kv, 1 as seqnum
from kv
where seqnum = 1
union all
select kv || row(kv.key, kv.value), kv.seqnum
from cte join
kv
on kv.seqnum = kv + 1      
)
select cte.*, row_number() over (order by seqnum)
from cte;

如果您愿意,可以取消对数组的测试以获得单独的行。

不幸的是,我手头没有Presto/Trino来检查语法。

相关内容

  • 没有找到相关文章

最新更新