找出一个句子是否有另一个句子的起始词或同一句子的结束词

  • 本文关键字:句子 结束 另一个 是否 一个 python
  • 更新时间 :
  • 英文 :


例如,我有一组这样的句子:

New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.

还有一句这样的话:

is cool in the south of that country

输出应为:The weather is cool in the south of that country.

如果我有一个类似of United States The weather is cool的输入,输出应该是:

D.C. is the capital of United States The weather is cool in the south of that country.

到目前为止,我尝试了difflib并得到了重叠,但这并不能完全解决所有情况下的问题。

您可以根据句子构建一个开始表达式和结束表达式的字典。然后在这些词典中为句子找到一个前缀和后缀来扩展。在这两种情况下,您都需要为从头到尾的每个单词子串构建/检查一个密钥:

sentences="""New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country
Lets take a bus to get to point b from point a""".split("n")
ends   =  { tuple(sWords[i:]):sWords[:i] for s in sentences
for sWords in [s.split()] for i in range(len(sWords)) }
starts  = { tuple(sWords[:i]):sWords[i:] for s in sentences
for sWords in [s.split()] for i in range(1,len(sWords)+1) }
def extendSentence(sentence):
sWords   = sentence.split(" ")
prefix   = next( (ends[p] for i in range(1,len(sWords)+1)
for p in [tuple(sWords[:i])] if p in ends),
[])
suffix   = next( (starts[p] for i in range(len(sWords))
for p in [tuple(sWords[i:])] if p in starts),
[])  
return " ".join(prefix + [sentence] + suffix)

输出:

print(extendSentence("of United States The weather is cool"))
# D.C. is the capital of United States The weather is cool in the south of that country
print(extendSentence("is cool in the south of that country"))
# The weather is cool in the south of that country

注意,我不得不删除句子末尾的句号,因为它们阻止了匹配。您需要在字典构建步骤中清理这些

最新更新