如何在文本文档中找到所有单词?

file = open('C:/Desktop/text.txt', encoding='utf8')
file = file.read()
result = file.findall('name') 
print (file[result+1:result+5])

每当我运行这段代码，就会得到错误AttributeError: 'str'对象没有属性'findall'

.findall()是正则表达式模块re的函数，而不是字符串。

import re
file = 'the quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.'
dogs = re.findall('dog', file)

结果:

['dog', 'dog', 'dog']

编辑:根据您关于字符串位置的问题，您可以在列表推导中使用另一个re工具.finditer()。

的例子:

dogs = [i.start() for i in re.finditer('dog',file)]

结果:

[41, 87, 133]

如果您想计算一个单词在文本文件中出现的次数，可以试试这个:

counter = 0
with open('path_to_file', 'r') as file:
for line in file:
for word in line.split():
if word == "word here":
counter += 1
print(counter)

将if语句替换为print(word)将打印文件中的每个单词。

相关内容

最新更新

热门标签：