按相同位置的字符筛选字符串列表



我正在尝试制作一个简单的函数,它可以获得三个输入:单词列表、猜测字母列表和模式。该模式是一个单词,其中一些字母用下划线隐藏。(例如单词apple和模式"_pp_e"(在某些上下文中,它是游戏"刽子手"的一部分,你可以尝试猜测一个单词,这个函数会给出提示。我想让这个函数从输入中返回一个过滤的单词列表,该列表不包含猜测字母列表中的任何字母,并且过滤的单词包含与给定模式相同的字母及其位置。我试着用三个循环来完成这项工作。

  1. 第一个循环,按照与模式相同的长度过滤所有单词
  2. 第二个循环,检查模式和给定单词之间的相似性。如果未过滤的单词确实包含字母,但不在同一位置,我会将其过滤掉
  3. 最后一个循环检查过滤后的单词是否不包含给定猜测列表中的任何字母

我试着让它发挥作用,但没有取得太多成功,我很乐意得到帮助。此外,任何缩短代码(不使用第三方库(的技巧都将不胜感激。提前感谢!

示例:图案:";d;猜单词表['b','c']和单词表包含所有英语单词。输出列表:['delegating','derogation','dishwasher']

这是更多上下文的代码:

def filter_words_list(words, pattern, wrong_guess_lst):
lst_return = []
lst_return_2 = []
lst_return_3 = []
new_word = ''
for i in range(len(words)):
if len(words[i]) == len(pattern):
lst_return.append(words[i])
pattern = list(pattern)
for i in range(len(lst_return)):
count = 0
word_to_check = list(lst_return[i])
for j in range(len(pattern)):
if pattern[j] == word_to_check[j] or (pattern[j] == '_' and
(not (word_to_check[j] in
pattern))):
count += 1
if count == len(pattern):
lst_return_2.append(new_word.join(word_to_check))
for i in range(len(lst_return_2)):
word_to_check = lst_return_2[i]
for j in range(len(wrong_guess_lst)):
if word_to_check.find(wrong_guess_lst[j]) == -1:
lst_return_3.append(word_to_check)
return lst_return_3

如果正则表达式在您的"工具箱";。(re模块在标准库中。(

在正则表达式中,.匹配任何单个字符。因此,我们将所有的_s替换为.s,并添加"^""$"以将正则表达式锚定到整个字符串。

import re
def filter_words(words, pattern, wrong_guesses):
re_pattern = re.compile("^" + re.escape(pattern).replace("_", ".") + "$")

# get words that 
#   (a) are the correct length 
#   (b) aren't in the wrong guesses 
#   (c) match the pattern
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
re_pattern.match(word)
)
]
all_words = [
"cat",
"dog",
"mouse",
"horse",
"cow",
]
print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))

打印

['cat']
['cat', 'cow']
['cow']

如果不喜欢使用regexp,可以将模式转换为dict,将每个定义的位置映射到应该在那里找到的字符:

def filter_words_without_regex(words, pattern, wrong_guesses):
# get a map of the pattern's defined letters to their positions
letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
# get words that
#   (a) are the correct length
#   (b) aren't in the wrong guesses
#   (c) have the correct letters in the correct positions
return [
word
for word in words
if (
len(word) == len(pattern) and
word not in wrong_guesses and
all(word[i] == ch for i, ch in letter_map.items())
)
]

结果是一样的。

可能不是最有效的,但这应该有效:

def filter_words_list(words, pattern, wrong_guess_lst):
fewer_words = [w for w in words if not any([wgl in w for wgl in wrong_guess_lst])]
equal_len_words = [w for w in fewer_words if len(w) == len(pattern)]
pattern_indices = [idl for idl, ltr in enumerate(pattern) if ltr != '_']
word_indices = [[idl for idl, ltr in enumerate(w) if ((ltr in pattern) and (ltr != '_'))] for w in equal_len_words]
out = [w for wid, w in zip(word_indices, equal_len_words) if ((wid == pattern_indices) and (w[pid] == pattern[pid] for pid in pattern_indices))]
return out

这个想法是首先删除wrong_guess_lst中所有有字母的单词。然后,删除所有长度不相同的条件(也可以将此条件合并到第一个条件中。(。接下来,为模式和剩余单词创建一个模式掩码,用于指示非'_'字母的位置。要成为候选人,口罩必须相同,这些位置的字母也必须相同。

请注意,我用列表理解片段替换了代码中的许多for循环。列表理解是一个非常有用的结构,它有很大帮助,尤其是当你不想使用其他库时。

编辑:我真的不能告诉你,你的代码哪里出了问题,因为它对我来说有点太长了。

正则表达式规则是显式构建的,特别是不需要检查单词的长度。为了实现这一点,使用标准库的itertools包中的groupby函数:

'_ b _ _ _'—正则表达式-->r'^.{1}b.{3}$'

以下是如何通过猜测字符串过滤字典:

import itertools as it
import re
# sample dictionary
dictionary = "a ability able about above accept according account across act action activity actually add address"
dictionary = dictionary.split()
guess = '_ b _ _ _'
guess = guess.replace(' ', '') # remove white spaces
# construction of the regex rule
regex = r'^'
for _, i in it.groupby(guess, key=lambda x: x == '_'):
if '_' in (l:=list(i)):
regex += ''.join(f'.{{{len(l)}}}') # escape the curly brackets
else:
regex += ''.join(l)
regex += '$'
# processing the regex rule
pattern = re.compile(regex)
# filter the dictionary by the rule
l = [word for word in dictionary if pattern.match(word)]
print(l)

输出

['about', 'above']

相关内容

  • 没有找到相关文章

最新更新