Python 3-用于长.txt文件的doctest模块



我有一个关于Python 3中doctest模块的快速问题。

我还没有使用过它,只是读了一些信息,以及它们是如何将它应用于函数的。我有两个功能需要测试,我想我明白我必须做什么。然而,我不知道如何在我的情况下应用它。我必须使用.txt文件的函数。第一个需要一个单词和文件作为输入和输出的文本文件的路径一个对列表,每个对由单词出现的行和相应的行号。

def find_all_instances(word, path):
l = []
with open(path, 'r') as file:
for position, line in enumerate(file.readlines()):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
print(find_all_instances('word', 'filename.txt'))

第二个函数将文本文件的文件路径作为输入,并输出配对列表,每个配对由一个单词和单词出现的次数组成按降序排列。

from collections import Counter
import re
def task_2(inp):
with open(inp, encoding="utf-8") as f:
data = (x.lower() for x in re.split(r'[n, .?!:;-]', f.read()) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()
task_2(r"filepath")

我现在的问题是:在这些情况下,我该如何应用它?由于我看到的doctest示例只使用简单的函数,例如将两个输入相乘。然而,在我的情况下,输出似乎相当大,因为文本文件大约有10’000行长,因此输出也同样大。如何实现这些功能?

我建议您创建一些函数来实际生成列表并记录它们:

from collections import Counter
import re
def find_all_instances(word, lines):
"""Returns a list of tuples (line, line_number) for the lines where the word appear.
>>> find_all_instances('test', ['first line', 'second line for test', 'third test line', 'last line'])
[('second line for test', 2), ('third test line', 3)]
"""
l = []
for position, line in enumerate(lines):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
def word_counter(text):
"""Returns a list of tuples (word, word_counter) for each word in a text, sorted by the most commons.
>>> word_counter('Lorem ipsum dolor sit amet, consectetur adipiscing elit.nSed non risus.n Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor.')
[('sid', 2), ('adipiscing', 2), ('amet', 2), ('dolor', 2), ('non', 1), ('ipsum', 1), ('ultricies', 1), ('consectetur', 1), ('risus', 1), ('elit', 1), ('nec', 1), ('tortor', 1), ('lorem', 1), ('lectus', 1), ('sed', 1), ('dignissim', 1)]
"""
data = (x.lower() for x in re.split(r'[n, .?!:;-]', text) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()

然后在其他处理文件打开的功能中使用它们:

def find_all_instances_from_path(word, path):
with open(path, 'r') as file:
return find_all_instances(word, file.readlines())
def task_2(inp):
with open(inp, encoding="utf-8") as f:
return word_counter(f.read())

最新更新