BFS,想要找到节点之间的最长路径,减少了findchildren-方法



我已经打开了另一个线程,正是这个主题,但我认为我发布了太多的代码,我不知道我的问题在哪里,现在我认为我有一个更好的想法,但仍然需要帮助。我们得到的是一个包含三个字母单词的文本文件,只有三个字母单词。我还有一个Word(节点)和队列类。我的findchildren-方法应该为一个单词,找到这个单词的所有子词,比如我输入"fan",然后我应该得到类似["kan","man"....等]的东西。代码现在看起来像这样:

def findchildren(mangd,parent): 
    children=set()
    lparent=list(parent)
    mangd.remove(parent)
    for word in mangd:
        letters=list(word)
        count=0
        i=0
        for a in letters:
            if a==lparent[i]:
                count+=1
                i+=1
            else:
                i+=1
            if count==2:
                if word not in children:
                    children.add(word)
            if i>2:
                break
    return children

上面的代码,对于findchildren目前工作得很好,但是,当我将它用于我的其他方法(实现bfs-search)时,一切都将花费太长时间,因此,我想将所有的子元素收集到一个包含子元素列表的字典中。我现在感觉这个任务超出了我的能力范围,但这有可能吗?我尝试创建这样的东西:

def findchildren2(mangd):
    children=[]
    for word in mangd:
        lparent=list(word)
        mangd.remove(word)
        letters=list(word)
        count=0
        i=0
        for a in letters:
            if a==lparent[i]:
                count+=1
                i+=1
            else:
                i+=1
            if count==2:
                if word not in children:
                    children.append(word)
            if i>2:
                break
    return children

我想我的最后一次尝试是简单的垃圾,我得到错误消息"设置更改大小使用迭代"。

def findchildren3(mangd,parent):
    children=defaultdict(list)
    lparent=list(parent)
    mangd.remove(parent)
    for word in mangd:
        letters=list(word)
        count=0
        i=0
        for a in letters:
            if a==lparent[i]:
                count+=1
                i+=1
            else:
                i+=1
            if count==2:
                children[0].append(word)
            if i>2:
                break
    return children

有更有效的方法来做到这一点(下面是O(n^2),所以不是很好),但这里有一个简单的算法让你开始:

import itertools
from collections import defaultdict
words = ['abc', 'def', 'adf', 'adc', 'acf', 'dec']
bigrams = {k: {''.join(x) for x in itertools.permutations(k, 2)} for k in words}
result = defaultdict(list)
for k, v in bigrams.iteritems():
    for word in words:
        if k == word:
            continue
        if len(bigrams[k] & bigrams[word]):
            result[k].append(word)
print result

生产:

defaultdict(<type 'list'>, {'abc': ['adc', 'acf'], 'acf': ['abc', 'adf', 'adc'], 'adf': ['def', 'adc', 'acf'], 'adc': ['abc', 'adf', 'acf', 'dec'], 'dec': ['def', 'adc'], 'def': ['adf', 'dec']})

下面是一个更有效的版本,并附有一些注释:

import itertools
from collections import defaultdict
words = ['abc', 'def', 'adf', 'adc', 'acf', 'dec']
# Build a map of {word: {bigrams}} i.e. {'abc': {'ab', 'ba', 'bc', 'cb', 'ac', 'ca'}}
bigramMap = {k: {''.join(x) for x in itertools.permutations(k, 2)} for k in words}
# 'Invert' the map so it is {bigram: {words}} i.e. {'ab': {'abc', 'bad'}, 'bc': {...}}
wordMap = defaultdict(set)
for word, bigramSet in bigramMap.iteritems():
    for bigram in bigramSet:
        wordMap[bigram].add(word)
# Create a final map of {word: {words}} i.e. {'abc': {'abc', 'bad'}, 'bad': {'abc', 'bad'}}
result = defaultdict(set)
for k, v in wordMap.iteritems():
    for word in v:
        result[word] |= v ^ {word}
# Display all 'childen' of each word from the original list
for word in words:
    print "The 'children' of word {} are {}".format(word, result[word])

生产:

The 'children' of word abc are set(['acf', 'adc'])
The 'children' of word def are set(['adf', 'dec'])
The 'children' of word adf are set(['adc', 'def', 'acf'])
The 'children' of word adc are set(['adf', 'abc', 'dec', 'acf'])
The 'children' of word acf are set(['adf', 'abc', 'adc'])
The 'children' of word dec are set(['adc', 'def'])

解决方案(可悲的是O(n^2))在Python 3中更新的要求(在这里运行):

from collections import defaultdict

words = ['fan', 'ban', 'fbn', 'ana', 'and', 'ann']
def isChildOf(a, b):
    return sum(map(lambda xy: xy[0] == xy[1], zip(a, b))) >= 2
result = defaultdict(set)
for word in words:
    result[word] = {x for x in words if isChildOf(word, x) and x != word}
# Display all 'childen' of each word from the original list
for word in words:
    print("The children of word {0} are {1}".format(word, result[word]))

生产:

The 'children' of word fan are set(['ban', 'fbn'])
The 'children' of word ban are set(['fan'])
The 'children' of word fbn are set(['fan'])
The 'children' of word ana are set(['and', 'ann'])
The 'children' of word and are set(['ann', 'ana'])
The 'children' of word ann are set(['and', 'ana'])

这里的算法非常简单,但不是很有效,但让我试着分解它。

isChildOf函数接受两个单词作为输入,并执行以下操作:

  1. zipa &b一起,这里两者都被视为迭代对象,每个字符是迭代中的一个"项"。例如,如果a'fan', b'ban',那么zip('fan', 'ban')会产生以下配对列表[('f', 'b'), ('a', 'a'), ('n', 'n')]

  2. 接下来,它使用map函数将lambda函数(匿名函数的奇特名称)应用于第一步生成的列表中的每个项。该函数只接受一对输入元素(即'f' &'b'),如果匹配则返回True,否则返回False。对于我们的示例,这将导致[False, True, True],因为第一对字符不匹配,但其余两对都匹配。

  3. 最后,函数在步骤2生成的列表上运行sum函数。在Python中,True的计算结果是1, False的计算结果是0,所以列表的和是2。然后简单地返回该数字是否大于或等于2

for word in words循环只是将每个输入单词与所有其他单词进行比较,并保留isChildOf求值为True的单词,注意不要添加单词本身。

我希望这是清楚的!

最新更新