使用计数方法对文本文件中的某个单词进行计数



我正在计算单词"the"在两本保存为文本文件的书中出现的次数。我正在运行的代码对每本书都返回零。

这是我的代码:

def word_count(filename):
    """Count specified words in a text"""
    try:
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count = line.lower().count('the')
            print (word_count)
    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print (msg)
dracula = 'C:\Users\HP\Desktop\Programming\Python\Python Crash   Course\TEXT files\dracula.txt'
siddhartha = 'C:\Users\HP\Desktop\Programming\Python\Python Crash Course\TEXT files\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)

我在这里做错了什么?

您正在为每个迭代重新分配word_count。这意味着最后它将与文件最后一行中the的出现次数相同。你应该得到这笔钱。还有一件事:there应该匹配吗?可能不会。您可能想要使用line.split()。此外,您可以直接遍历文件对象;不需要CCD_ 5。最后,使用生成器表达式进行简化。我的第一个例子没有生成器表达式;第二个是

def word_count(filename):
    with open(filename) as f_obj:
        total = 0
        for line in f_obj:
            total += line.lower().split().count('the')
        print(total)
def word_count(filename):
    with open(filename) as f_obj:
        total = sum(line.lower().split().count('the') for line in f_obj)
        print(total)

除非单词"the"出现在每个文件的最后一行,否则您将看到零。

您可能希望将word_count变量初始化为零,然后使用增广加法(+=):

例如:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0                                       # <- change #1 here
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count += line.lower().count('the')      # <- change #2 here
            print(word_count)
    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print(msg)
dracula = 'C:\Users\HP\Desktop\Programming\Python\Python Crash   Course\TEXT files\dracula.txt'
siddhartha = 'C:\Users\HP\Desktop\Programming\Python\Python Crash Course\TEXT files\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)

增强添加不是必要的,只是有帮助。此行:

word_count += line.lower().count('the')

可以写成

word_count = word_count + line.lower().count('the')

但你也不需要一次把所有的行都读入内存。您可以直接从文件对象开始对行进行迭代。例如:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0
        with open(filename) as f_obj:
            for line in f_obj:                     # <- change here
                word_count += line.lower().count('the')
        print(word_count)
    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
        print(msg)
dracula = 'C:\Users\HP\Desktop\Programming\Python\Python Crash Course\TEXT files\dracula.txt'
siddhartha = 'C:\Users\HP\Desktop\Programming\Python\Python Crash Course\TEXT files\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)

另一种方式:

with open(filename) as f_obj:
    contents = f_obj.read()
    print("The word 'the' appears " + str(contents.lower().count('the')) + " times")
import os
def word_count(filename):
    """Count specified words in a text"""
    if os.path.exists(filename):
        if not os.path.isdir(filename):
            with open(filename) as f_obj:
                print(f_obj.read().lower().count('t'))
        else:
            print("is path to folder, not to file '%s'" % filename)
    else:
        print("path not found '%s'" % filename)

相关内容

  • 没有找到相关文章

最新更新