如何将句子设置为变量 NLTK



我对使用 nltk 很陌生,并且卡住了。我想将文本文件拆分为单独的句子,并将每个句子设置为一个变量以供以后使用。我负责第一部分:

import nltk
from nltk.tokenize import sent_tokenize
text1 = open('/Users/joshuablew/Documents/myCorpus/version1.txt').read()
sent_tokenize(text1)

这将打印回分隔的每个句子:

['Who was the 44th president of the United States?', 'Where does he live?', 'This is just a plain sentence.', 'As well as this one, just to break up the questions.', 'How many houses make up the United States Congress?', 'What are they called?', 'Again, another question breakpoint here.', 'Who is our current President?', 'Can he run for re-election?', 'Why or why not?']

从这里我不知道该怎么做才能将这些句子自动保存到变量中。

或者,是否可以将索引text1[0] = 'Who was the 44th president of the United States?',以及text1[1] = 'Where does he live?'等等?其中,文本文件的每个索引是每个单独的句子

感谢您的帮助。

import nltk
from nltk.tokenize import sent_tokenize
with open('1.txt', 'r') as myfile:
    sentences=myfile.read()
number_of_sentences = sent_tokenize(sentences)
print(len(number_of_sentences))
textList = sent_tokenize(sentences)
print(textList)

最新更新