我有一个字母字符串[str1,str2,...]
的列表,我需要使用等价关系R将其划分为等价类,其中str1 R str2
(在关系表示法中)如果str2
可以通过一系列有效的单字母变化从str1
获得,其中"valid"表示它产生一个有效的字母词,例如cat --> car
有效,但cat --> 'cax
无效。如果输入列表是['cat','ace','car','zip','ape','pip']
,那么代码应该返回[['cat','car'],['ace','ape'],['zip','pip']]
。
我有一个初始的工作版本,但是,它产生了一些包含重复的"类"。
我不认为有任何Python包允许我定义这样的等价关系,但即使这样,最好的方法是什么?
应该适用于不同长度的字符串。显然,订购很重要。
def is_one_letter_different(s1, s2):
if len(s1) != len(s2):
return False
diff_count = 0;
for char1, char2 in zip(s1, s2):
if char1 != char2:
diff_count += 1
return diff_count == 1
def group(candidates):
groups = []
for candidate in candidates:
for group in groups:
for word in group:
if is_one_letter_different(word, candidate):
group.append(candidate)
break
if candidate in group:
break
else:
groups.append([candidate])
return groups
print group(['bread','breed', 'bream', 'tread', 'treat', 'short', 'shorn', 'shirt', 'shore', 'store','eagle','mired', 'sired', 'hired'])
输出:
[['bread', 'breed', 'bream', 'tread', 'treat'], ['short', 'shorn', 'shirt', 'shore', 'store'], ['eagle'], ['mired', 'sired', 'hired']]
编辑:更新以遵循其他测试用例。我不确定输出是否正确-请验证。并在下次为我们提供良好的测试用例。
我会这样做:构建一个无向图,其中每个单词都是一个节点,每条边表示它们之间的关系成立。然后,您可以识别图中每个断开连接的"岛",每个岛代表一个等价类。
from collections import defaultdict
def exactly_one(iter):
count = 0
for x in iter:
if x:
count += 1
if count > 1:
break
return count == 1
def are_one_letter_apart(a,b):
if len(a) != len(b): return False
return exactly_one(a_char != b_char for a_char, b_char in zip(a,b))
def pairs(seq):
for i in range(len(seq)):
for j in range(i+1, len(seq)):
yield (seq[i], seq[j])
def search(graph, node):
seen = set()
to_visit = set()
to_visit.add(node)
while to_visit:
cur = to_visit.pop()
if cur in seen: continue
for neighbor in graph[cur]:
if neighbor not in seen:
to_visit.add(neighbor)
seen.add(cur)
return seen
def get_islands(graph):
seen = set()
islands = []
for item in graph.iterkeys():
if item in seen: continue
group = search(graph, item)
seen = seen | group
islands.append(group)
return islands
def create_classes(seq, f):
graph = defaultdict(list)
for a,b in pairs(seq):
if f(a,b):
graph[a].append(b)
graph[b].append(a)
#one last pass to pick up items with no relations to anything else
for item in seq:
if item not in graph:
graph[item].append(item)
return [list(group) for group in get_islands(graph)]
seq = ['cat','ace','car','zip','ape','pip']
print create_classes(seq, are_one_letter_apart)
结果:
[['ace', 'ape'], ['pip', 'zip'], ['car', 'cat']]