如何在python中将一段文本修剪成完整的句子



我有一段不是完整句子的文本。例如

reased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Indus

完整的句子是

... increased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...

我想要的是,如果句子被切断,例如... increased 11%Industry...,那么我丢弃它们,只返回完整的句子Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged.

我可以用nltk或空格吗?

对不起,我没有把我的问题说清楚。

可能有不同的情况:

  • Hey! How are you? Good!应返回Hey! How are you? Good!

  • ...ey. How are you? I am good. How about....应该返回How are you? I am good.

我不知道文本中完整句子的数量。

您可以使用spacy获取字符串中的所有句子,例如

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("I am good. How are you? Thank you.")
for sent in doc.sents:
print(sent)

sents属性包含字符串中的所有句子。

Output:

I am good.
How are you?
Thank you.

对于使用

的情况
... increased by 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...

并且只获得完整的句子,您可以简单地将doc.sents放在list()方法中,并使用索引来访问它。如

doc = nlp("... increased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...")
print(list(doc.sents)[1])

输出:

Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged.

最新更新