根据标签模式和其他标签过滤post -tag结果



原句

key_list=['来自非线性分析和偏微分方程的技术构成了这些研究的基础','微分方程很酷。' '这不是一个太大的方程']

Spacy Tagging.
[[['techniques', 'NNS'], ['from', 'IN'], ['nonlinear', 'JJ'], ['analysis', 'NN'], ['and', 'CC'], ['partial', 'JJ'], ['differential', 'JJ'], ['equations', 'NNS'], ['form', 'VBP'], ['the', 'DT'], ['basis', 'NN'], ['for', 'IN'], ['these', 'DT'], ['studies', 'NNS'], ['.', '.']],
[['differential', 'JJ'], ['equations', 'NNS'], ['are', 'VBP'], ['cool', 'JJ'], ['.', '.']], 
[['it', 'PRP'], ['is', 'VBZ'], ['not', 'RB'], ['too', 'RB'], ['great', 'JJ'], ['of', 'IN'], ['an', 'DT'], ['equation', 'NN']]]

我正在使用wordnet来使事情变得更简单,但是有没有一种方法可以让我得到一个句子的所有名词以及像[RB,RB,JJ]和[JJ,NN]这样的标记模式?

required output.
[['techniques' ,'nonlinear analysis', 'differential equations', 'basis','studies'],['differential equations'],['not too great','equation']]

你需要这样的东西如果我理解正确的话

import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
text= """techniques from nonlinear analysis and partial
differential equations form the basis for these studies. 
Differential equations are cool. It is not too great of an equation"""
doc = nlp(text)
pattern1 = [{"TAG": {"IN": ["NN", "NNS"]}}]
pattern2 = [{"TAG": "RB"},{"TAG": "RB"}, {"TAG": "JJ"}]
matcher.add("matcher", [pattern1, pattern2])
for sent in doc.sents:
matches = matcher(sent)
for match_id, start, end in matches:
print(sent[start:end])

相关内容

  • 没有找到相关文章

最新更新