我想在。txt文件中搜索单词列表,并打印文本中包含单词列表中任何单词的任何行。
我首先使用.split()
拆分出raw_input
(称为userInput
)并得到一个词表。之后,我用另一个黑名单单词列表过滤当前的单词列表,得到一个最终过滤的单词列表。在本例中,我想搜索文本文件中的任何单词。
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
我将userInput
与.split()
分开并称为uqWords
后,我从exWords
列表中的任何单词中过滤它们并将输出称为fqWords
。现在我想在Database.txt
中搜索fqWords
列表中的任何单词并打印行。
有待指定;我的完整代码是:
import time
import random
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
DB = open("Database.txt")
for line in DB:
if fqWords in line:
print (R + line[:-1])
CDB = open("CodeDB.txt")
for code in CDB:
if fqWords in code:
print (R + code[:-1])
break
if fqWords not in (code and line):
randomError = random.choice(Error)
print (R + (randomError))
尝试使用这个函数:
def search_for_lines(filename, words_list):
words_found = 0
with open(filename) as db_file:
for line_no, line in enumerate(db_file):
if any(word in line for word in words_list):
print(line_no, ':', line)
words_found += 1
return words_found
只需传递文件名和要搜索的单词列表,它将打印行号以及行内容,并返回使用任何单词找到的行数。当文件遍历每行时,Enumerate将为您提供行号和行本身的元组。
要将其添加到现有代码中并搜索两个文件,您需要首先声明它,然后在分配fqWords
之后调用它,如下所示:
import random
def search_for_lines(filename, words_list):
words_found = 0
with open(filename) as db_file:
for line_no, line in enumerate(db_file):
if any(word in line for word in words_list):
print(line_no, ':', line)
words_found += 1
return words_found
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
search_for_lines("Database.txt", fqWords)
words_found = search_for_lines("CodeDB.txt", fqWords)
if words_found > 0:
break
else:
randomError = random.choice(Error)
print (R + (randomError))
如果您不需要修改列表,请使用tuple
。对于命名标识符,请参见PEP 8。
要得到序列的差值,使用set
,例如,{1,2,3} - {2,3}
是{1}
。
如果您在循环中open
相同的文件,它将在每次迭代中打开,因此最好将它们移出循环。
import random
def get_line_with_words(lines, words):
"""returns list of lines if any of the words
in any of the lines
"""
return [(i, line.strip()) for i, line in enumerate(lines,1) if any(word in line for word in words)]
errors = ("Sorry, I don't understand.", "I don't get it")
ex_words = ('Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!')
prefix = "Rel > "
with open("Database.txt") as db, open("CodeDB.txt") as cdb:
while True:
user_input = raw_input("> ")
uq_words = user_input.split()
fq_words = frozenset(uq_words) - frozenset(ex_words)
res1 = get_line_with_words(db, fq_words)
res2 = get_line_with_words(cdb, fq_words)
if res1 and res2:
for n, line in res1 + res2:
print('{} {} {}'.format(prefix, n, line)
break
print('{} {}'.format(prefix, random.choice(errors)))
db.seek(0)
cdb.seek(0)