我有一段不是完整句子的文本。例如
reased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Indus
完整的句子是
... increased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...
我想要的是,如果句子被切断,例如... increased 11%
和Industry...
,那么我丢弃它们,只返回完整的句子Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged.
我可以用nltk或空格吗?
对不起,我没有把我的问题说清楚。
可能有不同的情况:
Hey! How are you? Good!
应返回Hey! How are you? Good!
...ey. How are you? I am good. How about....
应该返回How are you? I am good.
我不知道文本中完整句子的数量。
您可以使用spacy
获取字符串中的所有句子,例如
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("I am good. How are you? Thank you.")
for sent in doc.sents:
print(sent)
sents
属性包含字符串中的所有句子。
Output
:
I am good.
How are you?
Thank you.
对于使用
的情况... increased by 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...
并且只获得完整的句子,您可以简单地将doc.sents
放在list()
方法中,并使用索引来访问它。如
doc = nlp("... increased 11%. Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged. Industry ...")
print(list(doc.sents)[1])
输出:
Search advertising revenue, excluding traffic acquisition costs, was relatively unchanged.