朋克的段落拆分列表 ( "." )



我有一个段落列表:

paragraphs = ['I do not like green eggs and ham. I am hungry, but I do not find anything to eat', '5.2. I do not like them Sam-I-am. I am Sam.', 'Blah, Blah, Blah']

我想在 punkt ("."( 处分隔这些段落,并获取每个句子的列表,并因此编写了以下代码:

sentences = []
for paragraph in paragraphs:
sentence = nltk.tokenize.sent_tokenize(paragraph)
sentences.append(sentence)

我得到了一个列表列表:

sentences = [['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat'], ['5.2.', 'I do not like them Sam-I-am.', 'I am Sam.'], ['Blah, Blah, Blah']]

相反,我想得到:

sentences = ['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat', '5.2.', 'I do not like them Sam-I-am.', 'I am Sam.', 'Blah, Blah, Blah']

我怎样才能得到这个?


在代码变量中,sentence本身就是一个字符串列表。您可以通过将sentence的每个元素附加到sentences来解决此问题。

sentences = []
for paragraph in paragraphs:
sentence = nltk.tokenize.sent_tokenize(paragraph)
for i in sentence:
sentences.append(i)

最新更新