我正在寻找一种算法来找到最佳的维度组合,以实现理想的结果。
以以下内容为例:
| A | B | C | y |
|--------|--------|-------|-----|
| dog | house1 | green | 30 |
| dog | house1 | blue | 15 |
| cat | house1 | green | 20 |
| cat | house2 | red | 5 |
| turtle | house3 | green | 50 |
A, B, C为测量尺寸。Y为测量结果。
如果我想获得y>= 50的所有维度组合,那么结果将是:
turtle, house3, green
turtle, any, green
turtle, house3, any
turtle, any, any
any, house3, green
any, house3, any
any, any, green
any, house1, green
any, house1, any
也许这是一个简单的问题,但我试着用O(n)来计算一个最优解,但我没有找到。
从包含(any, any, ..., any), 0
的工作队列开始。这个队列的元素将是一对,由一个组合和左边的一些元素组成,这些元素不能从any
中更改(稍后会更有意义)。直到工作队列为空,从其中删除一个元素并计算相应的总和。如果它不符合阈值,则丢弃它。否则,将其报告为被搜索的组合之一。对于每个可以更改的any
,对于该列中的每个值,将当前的any
替换为该值的组合排队,索引锁定所有先前的any
值。
考虑到输出敏感界,这在最优的多项式因子范围内(通常,可以有指数级的许多组合)。
在Python 3:
def overthreshold(data, threshold):
queue = [(('any',) * len(data[0][0]), 0)]
for combination, begin in queue:
if sum(row[1] for row in data
if all(x in {'any', y}
for x, y in zip(combination, row[0]))) < threshold:
continue
yield combination
for i in range(begin, len(combination)):
if combination[i] == 'any':
queue.extend((combination[:i] + (x,) + combination[i+1:], i + 1)
for x in {row[0][i] for row in data})
def demo():
data = [
(('dog', 'house1', 'green'), 30),
(('dog', 'house1', 'blue'), 15),
(('cat', 'house1', 'green'), 20),
(('cat', 'house2', 'red'), 5),
(('turtle', 'house3', 'green'), 50),
]
for combination in overthreshold(data, 50):
print(combination)
回到这里,8年后使用ClickHouse回答这个问题:
WITH table AS (
SELECT 'dog' AS a, 'house1' AS b, 'green' AS c, 30 AS y
UNION ALL
SELECT 'dog' AS a, 'house1' AS b, 'blue' AS c, 15 AS y
UNION ALL
SELECT 'cat' AS a, 'house1' AS b, 'green' AS c, 20 AS y
UNION ALL
SELECT 'cat' AS a, 'house2' AS b, 'red' AS c, 5 AS y
UNION ALL
SELECT 'turtle' AS a, 'house3' AS b, 'green' AS c, 50 AS y
)
SELECT a, b, c, sum(y) y FROM table GROUP BY CUBE(a, b, c)
HAVING y >= 50
FORMAT PrettyCompactMonoBlock;
┌─a──────┬─b──────┬─c─────┬───y─┐
│ turtle │ house3 │ green │ 50 │
│ turtle │ house3 │ │ 50 │
│ turtle │ │ green │ 50 │
│ turtle │ │ │ 50 │
│ │ house3 │ green │ 50 │
│ │ house1 │ green │ 50 │
│ │ house3 │ │ 50 │
│ │ house1 │ │ 65 │
│ │ │ green │ 100 │
│ │ │ │ 120 │
└────────┴────────┴───────┴─────┘