使用 Python 优化 Wordle Bot - 搜索包含 a、b 和 c 的单词?



我一直在写一个Wordle机器人,想看看它是如何处理所有13,000个单词的。问题是我通过一个 for 循环运行它,它效率非常低。运行 30 分钟后,它只能达到 5% 左右。我可以一直等,但最终会是 10+ 小时。必须有一种更有效的方法。我是python的新手,所以任何建议将不胜感激。

此处的代码是用于限制每次猜测的代码。有没有办法搜索包含"a"、"b"和"c"的单词?而不是单独运行 3 次。现在,每次我需要搜索新字母时,containts,nocontains和isletter都会运行。将它们全部搜索在一起将大大减少时间。

#Find the words that only match the criteria
def contains(letter, place):
list.clear()
for x in words:
if x not in removed:
if letter in x:
if letter == x[place]:
removed.append(x)
else:
list.append(x)
else:
removed.append(x)
def nocontains(letter):
list.clear()
for x in words:
if x not in removed:
if letter not in x:
list.append(x)
else:
removed.append(x)
def isletter(letter, place):
list.clear()
for x in words:
if x not in removed:
if letter == x[place]:
list.append(x)
else:
removed.append(x)

通过使用集合可以大大减少性能问题。任何时候您想重复测试成员资格(即使只有几次),例如if x not in removed,你想尝试做一个集合。列表需要检查每个元素才能找到 x,如果列表有数千个元素,这很糟糕。在 Python 集合中,如果removed100元素或100,000if x not in removed应该需要很长时间才能运行,这是一个很小的恒定时间。

除此之外,您还会遇到问题,尝试在任何地方使用可变的全局变量,例如list(需要重命名)和removed。这样做没有任何好处,而且有几个缺点,例如更难推理或优化代码。Python 的一个好处是,您可以将大型容器或对象传递给函数,而无需任何额外的时间或空间成本:调用函数f(huge_list)的速度和占用的内存一样快,并且使用的内存与f(tiny_list)一样多,就像您在其他语言中通过引用传递一样,所以不要犹豫,使用容器作为函数参数或返回类型。

总之,如果您去掉"列表"和"已删除"并将其存储为可能的单词set,则可以重构您的代码:

all_words = []  # Huge word list to read in from text file
current_possible_words = set(all_words)
def contains_only_elsewhere(possible_words, letter, place):
"""Given letter and place, remove from possible_words
all words containing letter but not at place"""
to_remove = {word for word in possible_words
if letter not in word or word[place] == letter}
return possible_words - to_remove
def must_not_contain(possible_words, letter):
"""Given a letter, remove from possible_words all words containing letter"""
to_remove = {word for word in possible_words
if letter in word}
return possible_words - to_remove
def exact_letter_match(possible_words, letter, place):
"""Given a letter and place, remove from possible_words
all words not containing letter at place"""
to_remove = {word for word in possible_words
if word[place] != letter}
return possible_words - to_remove

外部代码将有所不同:例如,

current_possible_words = exact_letter_match(current_possible_words, 'a', 2)`

进一步的优化是可能的(现在更容易了):只存储单词的索引而不是字符串;为每个字母预计算包含该字母的所有单词的集合,等等。

我刚刚写了一个单词机器人,它在大约一秒钟内运行,包括网络抓取以获取 5 个字母单词的列表。

import urllib.request
from bs4 import BeautifulSoup
def getwords():
source = "https://www.thefreedictionary.com/5-letter-words.htm"
filehandle = urllib.request.urlopen(source)
soup = BeautifulSoup(filehandle.read(), "html.parser")
wordslis = soup.findAll("li", {"data-f": "15"})
words = []
for k in wordslis:
words.append(k.getText())
return words
words = getwords()
def hasLetterAtPosition(letter,position,word):
return letter==word[position]
def hasLetterNotAtPosition(letter,position,word):
return letter in word[:position]+word[position+1:]
def doesNotHaveLetter(letter,word):
return not letter in word
lettersPositioned = [(0,"y")]
lettersMispositioned = [(0,"h")]
lettersNotHad = ["p"]
idx = 0
while idx<len(words):
eliminated = False
for criteria in lettersPositioned:
if not hasLetterAtPosition(criteria[1],criteria[0],words[idx]):
del words[idx]
eliminated = True
break
if eliminated:
continue
for criteria in lettersMispositioned:
if not hasLetterNotAtPosition(criteria[1],criteria[0],words[idx]):
del words[idx]
eliminated = True
break
if eliminated:
continue
for letter in lettersNotHad:
if not doesNotHaveLetter(letter,words[idx]):
del words[idx]
eliminated = True
break
if eliminated:
continue
idx+=1
print(words) # ["youth"]

你的速度很慢的原因是,除了检查每个检查的所有单词之外,除了一些多余的逻辑条件之外,您还有很多调用来检查单词是否被删除。

编辑:这是一个获取单词的函数,可以获取更多单词。

def getwords():
source = "https://wordfind-com.translate.goog/length/5-letter-words/?_x_tr_sl=es&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp"
filehandle = urllib.request.urlopen(source)
soup = BeautifulSoup(filehandle.read(), "html.parser")
wordslis = soup.findAll("a", {"rel": "nofollow"})
words = []
for k in wordslis:
words.append(k.getText())
return words

最新更新