如何在空间中提取动词短语?



例如:

当一个人考虑到他能买到的所有冰淇淋勺时,终极漩涡冰淇淋勺通常被高估了。

这里我想摘:

  • 主题:"Ultimate Swirly Ice Cream scoper ">
  • 状语从句:"当一个人考虑到他能买到的所有勺子";
  • 动词短语:"通常被高估了">

我对subject,objectadverbial clause有以下函数:

def get_subj(decomp):
for token in decomp:
if ("subj" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_obj(decomp):
for token in decomp:
if ("dobj" in token.dep_ or "pobr" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_advcl(decomp):
for token in decomp:
# print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
if ("advcl" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."
nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)
subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)
print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)

输出:

subj:  Ultimate Swirly Ice Cream Scoopers
obj:  all of the scoopers
advcl:  when one considers all of the scoopers one could buy

然而,实际的depenency类型.dep_对于VP的最后一个词,"&;通常被高估&;",是&;root &;

因此,子树技术失败了,因为ROOT的子树返回整个句子。

您想要构建更像"动词组"的东西,其中您只保留词根动词的某些密切依赖项,如aux,copadvmod,但不包括nsubj,objadvcl

最新更新