我有一个3989.txt文件的文件。我用nltk读取txt文件的id,并将所有id放在文件中,现在我想读取每个txt文件,并有两个输出,一个是所有txt文件的文本语料库,另一个是每个txt文件的行数组。但我甚至不能读取文件,我有这个错误FileNotFoundError: [Errno 2] No such file or directory:
。如何修复它并获得两个输出?感谢您的帮助!!这是我的代码:
from nltk.corpus import BracketParseCorpusReader
corpus_root = '/Users/sima/Downloads/Selected 20-newsgroupto work on/kole databaseha'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
files = wordlists.fileids()
for file in files:
f = open(file,'r')
lines = f.read()
print(lines)
您需要将basename加入根目录,除非您的cwd是文件所在的位置,但由于您的错误,它们不是:
from nltk.corpus import BracketParseCorpusReader
from os.path import join
corpus_root = '/Users/sima/Downloads/Selected 20-newsgroupto work on/kole databaseha'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
files = wordlists.fileids()
for file in files:
with open(join(corpus_root, file)) as f:
lines = f.read()
print(lines)