朋克的段落拆分列表 ( "." )

我有一个段落列表：

paragraphs = ['I do not like green eggs and ham. I am hungry, but I do not find anything to eat', '5.2. I do not like them Sam-I-am. I am Sam.', 'Blah, Blah, Blah']

我想在 punkt ("."( 处分隔这些段落，并获取每个句子的列表，并因此编写了以下代码：

sentences = []
for paragraph in paragraphs:
sentence = nltk.tokenize.sent_tokenize(paragraph)
sentences.append(sentence)

我得到了一个列表列表：

sentences = [['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat'], ['5.2.', 'I do not like them Sam-I-am.', 'I am Sam.'], ['Blah, Blah, Blah']]

相反，我想得到：

sentences = ['I do not like green eggs and ham.', 'I am hungry, but I do not find anything to eat', '5.2.', 'I do not like them Sam-I-am.', 'I am Sam.', 'Blah, Blah, Blah']

我怎样才能得到这个？

在代码变量中，sentence本身就是一个字符串列表。您可以通过将sentence的每个元素附加到sentences来解决此问题。

sentences = []
for paragraph in paragraphs:
sentence = nltk.tokenize.sent_tokenize(paragraph)
for i in sentence:
sentences.append(i)

相关内容

最新更新

热门标签：