实际上,我使用Java从西班牙语文本中提取三元组。我需要提取NP-VP-NP
形式的三胞胎.我也在使用斯坦福解析器 CoreNLP v 3.7.0 和西班牙模型 v 3.7.0。我的问题是,有没有办法从西班牙语模型的句子中提取 NP 子树和 VP 子树?我意识到西班牙语解析器树形式与英语形式不同。
前任:
(ROOT (sentence (sn (spec (da0000 El)) (grup.nom (nc0s000 reino))) (grup.verb (vmm0000 canta) (sadv (spec (rg muy)) (grup.adv (rg bien))) (fp .)))
您应该使用主发行版来确保您拥有所有内容并下载西班牙模型
(可在此处获得: http://stanfordnlp.github.io/CoreNLP/download.html(
package edu.stanford.nlp.examples;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.tregex.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class TregexExample {
public static void main(String[] args) {
// set up pipeline
Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-spanish.properties");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Spanish example
Annotation spanishDoc = new Annotation("...insert Spanish text...");
pipeline.annotate(spanishDoc);
// get first sentence
CoreMap firstSentence = spanishDoc.get(CoreAnnotations.SentencesAnnotation.class).get(0);
Tree firstSentenceTree = firstSentence.get(TreeCoreAnnotations.TreeAnnotation.class);
// use Tregex to match
String nounPhrasePattern = "/grup\.nom/";
TregexPattern nounPhraseTregexPattern = TregexPattern.compile(nounPhrasePattern);
TregexMatcher nounPhraseTregexMatcher = nounPhraseTregexPattern.matcher(firstSentenceTree);
while (nounPhraseTregexMatcher.find()) {
nounPhraseTregexMatcher.getMatch().pennPrint();
}
}
}