使用二进制树从事件列表中记录成功/失败



我有一个事件 ['one', 'two', 'three']的列表。这些事件可以通过或失败。我想构建一个决策树以记录可能的结果 - 也就是说:

                    <root>
                      /
      <one_pass>              <one_fail>
          /                      /
<two_pass>  <two_fail>  <two_pass>  <two_fail>
...

从多个出色的答案中,我可以看到递归是可以与之相处的方式,似乎与二进制树结合了。我正在努力挣扎的是在每个级别上建立通行/失败的循环...是for循环还是我使用递归?

我的代码起点来自我在此处找到的答案,在下面复制,以方便参考。(我想在以后的阶段将值存储在每个节点中,然后计算这些总和,因此我在这里开始):

class Node:
    def __init__(self, data, children=None):
        if children is None:
            children = []
        self.data = data
        self.children = children
    def __str__(self):
        return str(self.data)
    __repr__ = __str__
def get_all_paths(node, path=None):
    paths = []
    if path is None:
        path = []
    path.append(node)
    if node.children:
        for child in node.children:
            paths.extend(get_all_paths(child, path[:]))
    else:
            paths.append(path)
    return paths

我已经手动构建了以下内容,以开始研究如何创建完整的结构,这就是我卡住的地方。

tree = Node("start", [
        Node("one_pass", [
            Node("two_pass"), 
            Node("two_fail")
        ]),
        Node("one_fail", [
            Node("two_pass"),
            Node("two_fail")
        ])
    ])

通过尝试此操作,我陷入了for循环级别。这让我认为我不知道这里使用的方法。如果循环在每次通行证上创建了两个节点 - 基本上是在一个迭代中创建左和右子节点 - 如何将它们链接到上一个节点?我需要一种类似以下伪代码的方法吗?

for event in events:
    insert_node(my_tree, event, "pass")
    insert_node(my_tree, event, "fail")

nb我有15个级别的树,如果这会影响任何东西。

二进制结构如此有用的原因之一是,它们很容易存储在内存和索引中,只有一些数学技巧。

如果您的所有条件都提前知道,则很容易将您的['pass', 'fail', 'pass', 'fail' etc..]列表视为二进制号码([1, 0, 1, 0]: 10),将其用作索引中的索引,预先分配给列表结果(2 n 其中n是条件的数量)。

如果您需要在决策的每个阶段存储值(即:存储['pass','fail']['pass', 'fail', 'fail']的值),这是更多的参与,但仍然不难。我们知道,任何给定数量的条件都会产生2 n 可能的结果,因此我们还可以知道有多少结果在N-1条件下以及N-2等。总共有2个 N -1所有较短条件的条件列出了与n条件有关的2 n 条件。通过添加我们的二进制号码,正如我们之前发现的2 n -1,我们可以为每个可能的条件列表获得一个唯一的索引。

如果您的条件很长,很容易看到可能的结果列表呈指数增长。如果您从不打算访问所有可能的结果,则使用数字键而不是列表来存储所有可能的结果可能是有益的。

给定您的示例:

                        <root> (0)
                    /               
          <one_pass> (1)           <one_fail> (2)
             /                          /     
    <two_pass>(3) <two_fail>(4) <two_pass>(5) <two_fail>(6)
      /            /             /              /    
   (7)  (8)      (9)    (10)     (11)   (12)    (13)    (14)
 /     /      /      /      /      /      /      /   
15, 16, 17, 18| 19, 20, 21, 22| 23, 24, 25, 26| 27, 28, 29, 30

从这里很容易提出一些功能将['pass', 'fail', 'pass', 'fail']列表转换为索引:

def list_to_number(L):
    acc = 0
    for item in L:
        acc *= 2 #shift number left as we shift in values from the right
        if item == 'fail': #assumes pass == 0 fail == 1
            acc += 1
    return acc

def idx(L): 
    return 2**len(L) - 1 + list_to_number(L)

完整示例:

#  Assume we will never have more than 4 events:
#  With 4 events we have 2**4 outcomes.
#  In total there are 2**5-1 possible outcomes including intermediate steps.
#  We will preallocate a list filled with `None` to length 2**5-1 so we have 
#    enough space for all possible outcomes.
tree_data = [None]*(2**5-1)
#insert value 42 given [event_1_pass, event_2_pass, event_3_fail]
tree_data[idx(['pass', 'pass', 'fail'])] = 42
#insert value 56 given [event_1_pass, event_2_pass, event_3_pass, event_4_pass]
tree_data[idx(['pass', 'pass', 'pass', 'pass'])] = 56
#retrieve value given [event_1_pass, event_2_pass, event_3_fail]
print(tree_data[idx(['pass', 'pass', 'fail'])])
# prints: 42
#retrieve value given [event_1_fail, event_2_pass, event_3_fail]
print(tree_data[idx(['fail', 'pass', 'fail'])])
# prints: None because we never stored anything there.

您可以使用以下代码(非恢复)生成树:

def generate_nodes(parent, data):  # adds data_pass and data_fail child nodes to parent
    passed = Node(data + '_pass')
    failed = Node(data + '_fail')
    parent.children.append(passed)
    parent.children.append(failed)
    return [passed, failed]  # returns new nodes

events = ['one', 'two', 'three']
root = Node('root')  # root of our tree
current_level_events = [root]  # nodes of currently processed level
for data in events:  # go through all events
    tmp = []
    for event in current_level_events:
        tmp += generate_nodes(event, data)  # generate events 
    current_level_events = tmp

递归实现:

events = ['one', 'two', 'three']
def generate_nodes(parent, index):
    if index >= len(events):  # we finished tree generation, no more events found
        return
    data = events[index]
    passed = Node(data + '_pass')
    failed = Node(data + '_fail')
    parent.children.append(passed)
    parent.children.append(failed)
    # generate children for passed and failed nodes
    generate_nodes(passed, index + 1)
    generate_nodes(failed, index + 1)

events = ['one', 'two', 'three']
root = Node('root')  # root of our tree
generate_nodes(root, 0)  # starts tree generation

最新更新