CoreNLP Stanford Dependency Format

关于港口和移民的法案由参议员布朗巴克提交，堪萨斯州共和党人

从上面的句子中，我希望获得以下类型化依赖项：

nsubjpass(submitted, Bills)
auxpass(submitted, were)
agent(submitted, Brownback)
nn(Brownback, Senator)
appos(Brownback, Republican)
prep_of(Republican, Kansas)
prep_on(Bills, ports)
conj_and(ports, immigration)
prep_on(Bills, immigration)

这应该可以按照斯坦福依赖项文档上的表 1、图 1 所示。

使用以下代码，我只能实现以下依赖构成(代码输出(：

root(ROOT-0, submitted-7)
nmod:on(Bills-1, ports-3)
nmod:on(Bills-1, immigration-5)
case(ports-3, on-2)
cc(ports-3, and-4)
conj:and(ports-3, immigration-5)
nsubjpass(submitted-7, Bills-1)
auxpass(submitted-7, were-6)
nmod:agent(submitted-7, Brownback-10)
case(Brownback-10, by-8)
compound(Brownback-10, Senator-9)
punct(Brownback-10, ,-11)
appos(Brownback-10, Republican-12)
nmod:of(Republican-12, Kansas-14)
case(Kansas-14, of-13)

问题- 如何实现上述所需输出？

法典

public void processTestCoreNLP() {
String text = "Bills on ports and immigration were submitted " +
"by Senator Brownback, Republican of Kansas";
Annotation annotation = new Annotation(text);
Properties properties = PropertiesUtils.asProperties(
"annotators", "tokenize,ssplit,pos,lemma,depparse"
);
AnnotationPipeline pipeline = new StanfordCoreNLP(properties);
pipeline.annotate(annotation);
for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
SemanticGraph sg = sentence.get(EnhancedPlusPlusDependenciesAnnotation.class);
Collection<TypedDependency> dependencies = sg.typedDependencies();
for (TypedDependency td : dependencies) {
System.out.println(td);
}
}
}

CoreNLP最近从旧的斯坦福依赖格式(顶部示例中的格式(切换到通用依赖。我的第一个建议是尽可能使用新格式。在解析器上的持续开发将使用通用依赖，并且格式在许多方面类似于旧格式，模修饰更改(例如，prep->nmod(。

但是，如果要删除旧的依赖项格式，可以使用CollapsedCCProcessedDependenciesAnnotation注释来实现。

如果你想通过 NN 依赖解析器获取一个句子的 CC处理和折叠的斯坦福依赖关系 (SD(，你必须设置一个属性来规避 CoreNLP 中的一个小错误。

但是，请注意，我们不再维护斯坦福依赖项代码，除非您有充分的理由使用 SD，否则我们建议您对任何新项目使用通用依赖项。查看通用依赖关系(UD(文档和Schuster and Manning (2016( 以获取有关UD表示的更多信息。

若要获取 CC 处理和折叠的 SD 表示形式，请按如下所示设置depparse.language属性：

public void processTestCoreNLP() {
String text = "Bills on ports and immigration were submitted " +
"by Senator Brownback, Republican of Kansas";
Annotation annotation = new Annotation(text);
Properties properties = PropertiesUtils.asProperties(
"annotators", "tokenize,ssplit,pos,lemma,depparse");
properties.setProperty("depparse.language", "English")
AnnotationPipeline pipeline = new StanfordCoreNLP(properties);
pipeline.annotate(annotation);
for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
SemanticGraph sg = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
Collection<TypedDependency> dependencies = sg.typedDependencies();
for (TypedDependency td : dependencies) {
System.out.println(td);
}
}
}

相关内容

最新更新

热门标签：