Python-使用等价关系对字符串列表进行分区



我有一个字母字符串[str1,str2,...]的列表,我需要使用等价关系R将其划分为等价类,其中str1 R str2(在关系表示法中)如果str2可以通过一系列有效的单字母变化从str1获得,其中"valid"表示它产生一个有效的字母词,例如cat --> car有效,但cat --> 'cax无效。如果输入列表是['cat','ace','car','zip','ape','pip'],那么代码应该返回[['cat','car'],['ace','ape'],['zip','pip']]

我有一个初始的工作版本,但是,它产生了一些包含重复的"类"。

我不认为有任何Python包允许我定义这样的等价关系,但即使这样,最好的方法是什么?

应该适用于不同长度的字符串。显然,订购很重要。

def is_one_letter_different(s1, s2):
    if len(s1) != len(s2):
        return False
    diff_count = 0;
    for char1, char2 in zip(s1, s2):
        if char1 != char2:
            diff_count += 1
    return diff_count == 1
def group(candidates):
    groups = []
    for candidate in candidates:
        for group in groups:
            for word in group:
                if is_one_letter_different(word, candidate):
                    group.append(candidate)
                    break
            if candidate in group:
                break
        else:
            groups.append([candidate])
    return groups
print group(['bread','breed', 'bream', 'tread', 'treat', 'short', 'shorn', 'shirt', 'shore', 'store','eagle','mired', 'sired', 'hired'])

输出:

[['bread', 'breed', 'bream', 'tread', 'treat'], ['short', 'shorn', 'shirt', 'shore', 'store'], ['eagle'], ['mired', 'sired', 'hired']]

编辑:更新以遵循其他测试用例。我不确定输出是否正确-请验证。并在下次为我们提供良好的测试用例。

我会这样做:构建一个无向图,其中每个单词都是一个节点,每条边表示它们之间的关系成立。然后,您可以识别图中每个断开连接的"岛",每个岛代表一个等价类。

from collections import defaultdict
def exactly_one(iter):
    count = 0
    for x in iter:
        if x:
            count += 1
            if count > 1: 
                break
    return count == 1
def are_one_letter_apart(a,b):
    if len(a) != len(b): return False
    return exactly_one(a_char != b_char for a_char, b_char in zip(a,b))
def pairs(seq):
    for i in range(len(seq)):
        for j in range(i+1, len(seq)):
            yield (seq[i], seq[j])
def search(graph, node):
    seen = set()
    to_visit = set()
    to_visit.add(node)
    while to_visit:
        cur = to_visit.pop()
        if cur in seen: continue
        for neighbor in graph[cur]:
            if neighbor not in seen:
                to_visit.add(neighbor)
        seen.add(cur)
    return seen
def get_islands(graph):
    seen = set()
    islands = []
    for item in graph.iterkeys():
        if item in seen: continue
        group = search(graph, item)
        seen = seen | group
        islands.append(group)
    return islands
def create_classes(seq, f):
    graph = defaultdict(list)
    for a,b in pairs(seq):
        if f(a,b):
            graph[a].append(b)
            graph[b].append(a)
    #one last pass to pick up items with no relations to anything else
    for item in seq:
        if item not in graph:
            graph[item].append(item)
    return [list(group) for group in get_islands(graph)]
seq = ['cat','ace','car','zip','ape','pip']
print create_classes(seq, are_one_letter_apart)

结果:

[['ace', 'ape'], ['pip', 'zip'], ['car', 'cat']]

最新更新