我试图使用CoreNLP服务器注释多个句子。然而,如果我试图用太多的句子,我得到:
Exception in thread "Thread-48" edu.stanford.nlp.io.RuntimeIOException: Could not connect to server: 192.168.108.60:9000
at edu.stanford.nlp.pipeline.StanfordCoreNLPClient$2.run(StanfordCoreNLPClient.java:393)
Caused by: java.io.IOException: Server returned HTTP response code: 500 for URL: http://192.168.108.60:9000?properties=%7B+%22inputFormat%22%3A+%22serialized%22%2C+%22outputSerializer%22%3A+%22edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer%22%2C+%22inputSerializer%22%3A+%22edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer%22%2C+%22annotators%22%3A+%22tokenize%2C+ssplit%2C+pos%2C+lemma%2C+ner%2C+parse%2C+dcoref%22%2C+%22outputFormat%22%3A+%22serialized%22+%7D
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at edu.stanford.nlp.pipeline.StanfordCoreNLPClient$2.run(StanfordCoreNLPClient.java:381)
如果我只运行10或20个句子,一切都是正常的,但随着它们的数量越来越大,服务器似乎崩溃了,我达到了超时限制或其他东西-至少这是我对此的唯一解释。
StanfordCoreNLPClient coreNlp = new StanfordCoreNLPClient(props, "192.168.108.60", 9000);
// ..
for(int windowSize : windowSizeList) {
Map<String, List<TaggedSentence>> aspectMap = new HashMap<>();
for (int i = 0; i < sentenceList.size(); i++) {
Annotation document = sentenceList.get(i);
try {
coreNlp.annotate(document);
} catch(Exception e) {
LOGGER.error("Error", e);
}
// ...
}
}
如何解决这个问题?
编辑:好的,我发现有一个超时选项:
props.setProperty("timeout", "50000");
,但这没有帮助。无论如何,它都在失败——只是需要更长的时间。
我也遇到过类似的问题。在我的例子中,我想使用共同引用解析,我通过使用以下注释器来解决:tokenize、ssplit、pos、lemma、ner、depparse、mention、coref
- 或者像下面这样的命令行:
java -Xmx5g -cp stanford-corenlp-models-3.6.0.jar:* edu.stanford.nlp.pipeline. standfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,depparse,mention,coref -file example_file.txt
原因是它更有效(相对于速度),根据这个页面:http://stanfordnlp.github.io/CoreNLP/coref.html#overview