例如:
当一个人考虑到他能买到的所有冰淇淋勺时,终极漩涡冰淇淋勺通常被高估了。
这里我想摘:
- 主题:"Ultimate Swirly Ice Cream scoper ">
- 状语从句:"当一个人考虑到他能买到的所有勺子";
- 动词短语:"通常被高估了">
我对subject
,object
和adverbial clause
有以下函数:
def get_subj(decomp):
for token in decomp:
if ("subj" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_obj(decomp):
for token in decomp:
if ("dobj" in token.dep_ or "pobr" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_advcl(decomp):
for token in decomp:
# print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
if ("advcl" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."
nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)
subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)
print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)
输出:
subj: Ultimate Swirly Ice Cream Scoopers
obj: all of the scoopers
advcl: when one considers all of the scoopers one could buy
然而,实际的depenency
类型.dep_
对于VP的最后一个词,"&;通常被高估&;",是&;root &;
因此,子树技术失败了,因为ROOT
的子树返回整个句子。
您想要构建更像"动词组"的东西,其中您只保留词根动词的某些密切依赖项,如aux
,cop
和advmod
,但不包括nsubj
,obj
或advcl
。