通过 Trie 搜索时出现问题



我编写了一个实现Trie数据结构的代码,其中它接受字符串列表和字符串计数。

lst = [['james',9],['chloe',20],['chlara',30]]

字符串是名称,后跟它的整数值是计数。构造 trie 后,它会提示用户输入前缀,

userinput = ch

这样,代码将返回字符串 chlara,因为与同样具有前缀 ch 的 chloe 相比,它的计数更高。我已经设法构建了 Trie,但我在搜索功能方面遇到了困难。

class Node:
def __init__(self):
self.children = [None] * 26
self.end = False
self.frequency = 0
self.goindex = 0
self.index = 0
class Trie:
def __init__(self):
self.root = Node()
def ord_char(self,key):
ord_rep = ord(key) - ord('a')
return ord_rep
def add_word(self,lst):
word = lst[0]    #word
freq = lst[1]    #frequency of string
word_len = len(word)    
current = self.root    #start from root node
for i in range(word_len):
position = self.ord_char(word[i])
if current.children[position] is None:
current.children[position] = Node()
current = current.children[position]
if current.frequency > freq:
continue
else:
current.frequency = freq
current.index = position
current.end = True  #end of string

def main():
trie = Trie()
for i in list2:
trie.add_word(i)
user = input("Enter a prefix: ")
print(trie.prefix_search(user))
if __name__ == "__main__":
main()

我被返回不完整的字符串"chla",我很确定这是由于我的搜索功能效率低下且无法正常工作。

更新

我现在面临的问题是,如果我的前缀是"异常",我被返回为"畸变"而不是"异常">

你从来没有正确地遍历你的尝试。您有两个嵌套的for循环,因此仅从前缀遍历另外两个节点(字符(。

我假设你想返回一个结果,即使有多个字符串具有匹配的后缀和匹配计数。

使用while循环,并继续遵循最高count值,直到到达没有更多子节点的节点,其值等于当前节点的计数。请验证该节点的end为 True,因为这将指示您的单词添加代码中存在错误:

def prefix_search(self, prefix):
# traverse the prefix
current = self.root
for c in prefix:
current = current.children[self.ord_char(c)]
if current is None:
return None  # None is a much better return value than -1
while current is not None:
for child in current.children:
if child is None:
continue
if child.count == current.count:
# found first child with same score, continue from here
prefix += chr(child.index + 97)
current = child
break
else:
# no children with equal score, this is the longest match
assert current.end, "Data structure inconsistent"
return prefix

演示:

>>> trie.prefix_search('ch')
'chlara'
>>> trie.prefix_search('j')
'james'

以及其中一些边缘情况:

>>> trie.add_word(('chlarissa', 9))  # longer word, lower count!
>>> trie.prefix_search('ch')
'chlara'
>>> trie.add_word(('janet', 6))  # same length and score as james, won't be found
>>> trie.prefix_search('j')
'james'

如果数据结构中存在错误;这会故意设置错误的计数:

>>> trie.root.children[9].children[0].count = 7  # 'ja', 7, so higher, but not an end node
>>> trie.prefix_search('j')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 59, in prefix_search
AssertionError: Data structure inconsistent

最新更新