在文本文件中搜索列表中的每个单词并打印行



我想在。txt文件中搜索单词列表,并打印文本中包含单词列表中任何单词的任何行。

我首先使用.split()拆分出raw_input(称为userInput)并得到一个词表。之后,我用另一个黑名单单词列表过滤当前的单词列表,得到一个最终过滤的单词列表。在本例中,我想搜索文本文件中的任何单词。

exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]

我将userInput.split()分开并称为uqWords后,我从exWords列表中的任何单词中过滤它们并将输出称为fqWords。现在我想在Database.txt中搜索fqWords列表中的任何单词并打印行。

有待指定;我的完整代码是:

import time
import random
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
    DB = open("Database.txt")
    for line in DB:
        if fqWords in line:
            print (R + line[:-1])
    CDB = open("CodeDB.txt")
    for code in CDB:
        if fqWords in code:
            print (R + code[:-1])
            break
        if fqWords not in (code and line):
            randomError = random.choice(Error)
            print (R + (randomError))

尝试使用这个函数:

def search_for_lines(filename, words_list):
    words_found = 0
    with open(filename) as db_file:
        for line_no, line in enumerate(db_file):
            if any(word in line for word in words_list):
                print(line_no, ':', line)
                words_found += 1
    return words_found

只需传递文件名和要搜索的单词列表,它将打印行号以及行内容,并返回使用任何单词找到的行数。当文件遍历每行时,Enumerate将为您提供行号和行本身的元组。

要将其添加到现有代码中并搜索两个文件,您需要首先声明它,然后在分配fqWords之后调用它,如下所示:

import random
def search_for_lines(filename, words_list):
    words_found = 0
    with open(filename) as db_file:
        for line_no, line in enumerate(db_file):
            if any(word in line for word in words_list):
                print(line_no, ':', line)
                words_found += 1
    return words_found
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
    search_for_lines("Database.txt", fqWords)
    words_found = search_for_lines("CodeDB.txt", fqWords)
    if words_found > 0:
        break
    else:
        randomError = random.choice(Error)
        print (R + (randomError))

如果您不需要修改列表,请使用tuple。对于命名标识符,请参见PEP 8。
要得到序列的差值,使用set,例如,{1,2,3} - {2,3}{1}
如果您在循环中open相同的文件,它将在每次迭代中打开,因此最好将它们移出循环。

import random
def get_line_with_words(lines, words):
    """returns list of lines if any of the words
       in any of the lines
    """
    return [(i, line.strip()) for i, line in enumerate(lines,1) if any(word in line for word in words)]
errors = ("Sorry, I don't understand.", "I don't get it")
ex_words = ('Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!')
prefix = "Rel > "
with open("Database.txt") as db, open("CodeDB.txt") as cdb:
    while True:
        user_input = raw_input("> ")
        uq_words = user_input.split()
        fq_words = frozenset(uq_words) - frozenset(ex_words)
        res1 = get_line_with_words(db, fq_words)
        res2 = get_line_with_words(cdb, fq_words)
        if res1 and res2:
            for n, line in res1 + res2:
                print('{} {} {}'.format(prefix, n, line)
            break
        print('{} {}'.format(prefix, random.choice(errors)))
        db.seek(0)
        cdb.seek(0)