使用 trie 在 python 中制作目录结构



我有一个文件名列表:

filenames = ["111", "112", "1341", "2213", "2131", "22222", "11111"]

这应该组织在一个目录结构中,并且一个目录中的最大文件数不应大于比方说2。因此,我将前缀树(trie,下面的代码(存储在字典中,前缀作为键,如果子树中的文件数量不超过最大值,则'end'

trie = make_trie(filenames, max_freq=2)

trie
{'1': {'1': {'1': 'end', '2': 'end'}, '3': 'end'},'2': {'1': 'end', '2': 'end'}}

然后,对于每个文件名,我在trie中进行查找(下面的代码(并相应地构建路径:

for f in filenames:
print("Filename: ", f, "tPath:", get_path(f, trie))
Filename:  111  Path: 1/1/1/
Filename:  112  Path: 1/1/2/
Filename:  1341         Path: 1/3/
Filename:  2213         Path: 2/2/
Filename:  2131         Path: 2/1/
Filename:  22222        Path: 2/2/
Filename:  11111        Path: 1/1/1/

这很好用,但是随着我对trie(make_trie(和查找(get_path(的天真实现,这变得令人望而却步。我的猜测是我应该采用高效的现有 trie 实现,例如pytriedatrie,但我真的不知道如何制作后缀数阈值为 2 的 trie,所以我有点卡在如何使用包上,例如:

import datrie
tr = datrie.Trie(string.digits) # make trie with digits
for f in filenames:
tr[f] = "some value" # insert into trie, but what should be the values??
tr.prefixes('111211321') # I can look up prefixes now, but then what?

如何使用现有的快速 trie 实现来构建我的目录结构?

我幼稚的嘘寒问暖

def make_trie(words, max_freq):
root = dict()
for word in words:
current_dict = root
for i in range(len(word)):
letter = word[i]
current_prefix = word[:i+1]
prefix_freq = sum(list(map(lambda x: x[:i+1]==current_prefix, words)))
if prefix_freq > max_freq:
current_dict = current_dict.setdefault(letter, {})
else:
current_dict = current_dict.setdefault(letter, "end")
break
return root
def get_path(image_id, trie):
result = ""
current_dict = trie
for i in range(len(image_id)):
letter = image_id[i]
if letter in current_dict:
result += letter + "/"
if current_dict[letter] == "end":
break
current_dict = current_dict[letter]
return result

这可以工作,使用os.makedirs.

import os
def create_dir_structure(filenames):
for filename in filenames:
os.makedirs(
'/'.join(e for e in str(filename))
)

create_dir_structure(
['1111', '1123']
)

如果您想看到任何不同的行为,请在评论中告诉我

最新更新