寻找最佳维数组合的算法



我正在寻找一种算法来找到最佳的维度组合,以实现理想的结果。

以以下内容为例:

|   A    |    B   |   C   |  y  |
|--------|--------|-------|-----|
| dog    | house1 | green | 30  |
| dog    | house1 | blue  | 15  |
| cat    | house1 | green | 20  |
| cat    | house2 | red   |  5  |
| turtle | house3 | green | 50  |

A, B, C为测量尺寸。Y为测量结果。

如果我想获得y>= 50的所有维度组合,那么结果将是:

turtle, house3, green
turtle, any,    green
turtle, house3, any
turtle, any,    any
any,    house3, green
any,    house3, any
any,    any,    green
any,    house1, green
any,    house1, any

也许这是一个简单的问题,但我试着用O(n)来计算一个最优解,但我没有找到。

从包含(any, any, ..., any), 0的工作队列开始。这个队列的元素将是一对,由一个组合和左边的一些元素组成,这些元素不能从any中更改(稍后会更有意义)。直到工作队列为空,从其中删除一个元素并计算相应的总和。如果它不符合阈值,则丢弃它。否则,将其报告为被搜索的组合之一。对于每个可以更改的any,对于该列中的每个值,将当前的any替换为该值的组合排队,索引锁定所有先前的any值。

考虑到输出敏感界,这在最优的多项式因子范围内(通常,可以有指数级的许多组合)。

在Python 3:

def overthreshold(data, threshold):
    queue = [(('any',) * len(data[0][0]), 0)]
    for combination, begin in queue:
        if sum(row[1] for row in data
               if all(x in {'any', y}
                      for x, y in zip(combination, row[0]))) < threshold:
            continue
        yield combination
        for i in range(begin, len(combination)):
            if combination[i] == 'any':
                queue.extend((combination[:i] + (x,) + combination[i+1:], i + 1)
                             for x in {row[0][i] for row in data})

def demo():
    data = [
        (('dog',    'house1', 'green'), 30),
        (('dog',    'house1', 'blue'),  15),
        (('cat',    'house1', 'green'), 20),
        (('cat',    'house2', 'red'),    5),
        (('turtle', 'house3', 'green'), 50),
    ]
    for combination in overthreshold(data, 50):
        print(combination)

回到这里,8年后使用ClickHouse回答这个问题:

WITH table AS (
    SELECT 'dog' AS a, 'house1' AS b, 'green' AS c, 30 AS y
    UNION ALL
    SELECT 'dog' AS a, 'house1' AS b, 'blue' AS c, 15 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house1' AS b, 'green' AS c, 20 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house2' AS b, 'red' AS c,  5 AS y
    UNION ALL
    SELECT 'turtle' AS a, 'house3' AS b, 'green' AS c, 50 AS y
)
SELECT a, b, c, sum(y) y FROM table GROUP BY CUBE(a, b, c)
HAVING y >= 50
FORMAT PrettyCompactMonoBlock;
┌─a──────┬─b──────┬─c─────┬───y─┐
│ turtle │ house3 │ green │  50 │
│ turtle │ house3 │       │  50 │
│ turtle │        │ green │  50 │
│ turtle │        │       │  50 │
│        │ house3 │ green │  50 │
│        │ house1 │ green │  50 │
│        │ house3 │       │  50 │
│        │ house1 │       │  65 │
│        │        │ green │ 100 │
│        │        │       │ 120 │
└────────┴────────┴───────┴─────┘

相关内容

  • 没有找到相关文章

最新更新