打开一个文件夹，写下指定文件夹中的文本文件上出现的前 100 个单词

我想编写一个名为LexicalAnalyzer的类，在这个类中，我必须根据文件夹目录编写以下函数。 gettop100words ：返回在所述文件夹的文本文件中找到的整体前 100 个单词的频率dictionary，不关心大写。

get_letter_frequencies ：返回字母频率的dictionary (a-z(

我怎么写这个LexicalAnalyzer？

只需在文件(文本文件 ofc(内执行 for 循环并添加每个单词及其出现的次数并返回字典。要拆分单词，只需将文件的整个文本添加到一个字符串中，然后使用拆分功能将单词分成列表并循环遍历它，然后做我在乞讨时告诉您的字典事情。

在fileinput中用于迭代文件
在collections.Counter用于计算对象(单词，字母(

例

环境：

$ tree /tmp/test
/tmp/test
├── file1.txt
├── file2.txt
└── file3.txt
0 directories, 3 files

数据：

$ tail -vn +1 /tmp/test/*.txt
==> /tmp/test/file1.txt <==
hello world
world foo bar egg
spam egg baz
end
==> /tmp/test/file2.txt <==
foo xxx yyy
qqq foo
eee ttt def
cmp
==> /tmp/test/file3.txt <==
Foo BAR
SpAm

片段：

import os
import fileinput
import collections
DIR = '/tmp/test'
files = [os.path.join(DIR, filename) for filename in os.listdir(DIR)]
words = collections.Counter()
letters = collections.Counter()
with fileinput.input(files=files) as f:
    for line in f:
        words.update(line.lower().split())
for word in words:
    letters.update(word)
# top 3 word
print(words.most_common(3))
# top 5 letters
print(letters.most_common(5))

输出：

[('foo', 4), ('egg', 2), ('spam', 2)]
[('e', 7), ('o', 4), ('y', 3), ('l', 3), ('q', 3)]

例

相关内容

最新更新

热门标签：