StanfordCoreNLP的解析树的生成被卡住了

当我使用StanfordCoreNLP在Spark上使用bigdata生成解析时，其中一个任务已经停滞了很长时间。我查找了错误，它显示如下：

at edu.stanford.nlp.ling.CoreLabel.(核心标签.java：68)   at edu.stanford.nlp.ling.CoreLabel$CoreLabelFactory.newLabel(CoreLabel.java：248)   at edu.stanford.nlp.trees.LabeledScoredTreeFactory.newLeaf(LabeledScoredTreeFactory.java：51)   at edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java：27)   at edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java：34)   at edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java：34)   at edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java：34)   在edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java：34)我认为

相关的代码如下：

import edu.stanford.nlp.pipeline.Annotation
import edu.stanford.nlp.pipeline.StanfordCoreNLP
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation
import edu.stanford.nlp.util.CoreMap
import scala.collection.JavaConversions._
object CoreNLP {
def transform(Content: String): String = {
val v = new CoreNLP
v.runEnglishAnnotators(Content);
v.runChineseAnnotators(Content)
}
}
class CoreNLP {
def runEnglishAnnotators(inputContent: String): String = {
var document = new Annotation(inputContent)
val props = new Properties
props.setProperty("annotators", "tokenize, ssplit, parse")
val coreNLP = new StanfordCoreNLP(props)
coreNLP.annotate(document)
parserOutput(document)
}
def runChineseAnnotators(inputContent: String): String = {
var document = new Annotation(inputContent)
val props = new Properties
val corenlp = new StanfordCoreNLP("StanfordCoreNLP-chinese.properties")
corenlp.annotate(document)
parserOutput(document)
}
def parserOutput(document: Annotation): String = { 
val sentences = document.get(classOf[SentencesAnnotation])
var result = ""
for (sentence: CoreMap <- sentences) { 
val tree = sentence.get(classOf[TreeAnnotation])
//output the  tree to file
result = result + "n" + tree.toString
}
result
}
}

我的同学说用于测试的数据是递归的，因此NLP是无休止地运行的。我不知道这是不是真的。

如果将props.setProperty("parse.maxlen", "100");添加到代码中，则将分析器设置为不分析长度超过 100 个标记的句子。这有助于防止崩溃问题。您应该尝试为您的应用程序设置最佳的最大句子长度。

相关内容

最新更新

热门标签：