加载 NER 分类器时出错 - ZLIB 输入流意外结束 - Error When Loading NER Classifier - Unexpected end of ZLIB input stream 小贝子编程网

>我用Stanford-NER训练了一个自定义NER模型。我创建了一个属性文件，并将-serverProperties参数与 java 命令一起使用来启动我的服务器(我遵循了我的另一个问题的方向，在这里看到)并加载我的自定义 NER 模型，但是当服务器尝试加载我的自定义模型时，它失败并显示此错误：java.io.EOFException: Unexpected end of ZLIB input stream

stderr.log输出错误如下：

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- 
[main] INFO CoreNLP - setting default constituency parser 
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz 
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead 
[main] INFO CoreNLP - to use shift reduce parser download English models jar from: 
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html 
[main] INFO CoreNLP -     Threads: 4 
[main] INFO CoreNLP - Liveness server started at /0.0.0.0:9000 
[main] INFO CoreNLP - Starting server... 
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:80 
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:35546] API call w/annotators tokenize,ssplit,pos,lemma,depparse,natlog,ner,openie 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos 
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec]. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...  [pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.297 (s) 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [13.6 sec]. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner 
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240  
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)     
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)     
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)   
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)  
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)     
at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(ObjectInputStream.java:3333)  
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1920) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2650) 
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462) 
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2963)     
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282)   
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266)  
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:141)   
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)     
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)    
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)   
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)  
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:451)    
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:154)   
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)   
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:273)    
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)    
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:583)    
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)     
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)   
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)     
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)   
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)     
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)  
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  
at java.lang.Thread.run(Thread.java:748)

我已经用谷歌搜索了这个错误，我读到的大部分内容都是关于 2007-2010 年的 Java 问题，其中 EOFException 被"任意"抛出。此信息来自此处。

"当使用gzip(通过新的Deflater(Deflater.BEST_COMPRESSION，true))时，对于某些文件，EOFException在膨胀结束时被抛出。尽管文件是正确的，但错误是EOFException的抛出不一致。对于某些文件，它被抛出，而其他文件则不是。

回答其他人关于此错误状态的问题，您必须关闭 gzip 的输出流...？不完全确定这意味着什么，我不知道我将如何执行该建议，因为斯坦福-NER是为我创建gzip文件的软件。

问：我可以采取哪些措施来消除此错误？我希望这种情况过去也发生在其他人身上。还寻求@StanfordNLPHelp的反馈，以了解过去是否出现过类似的问题，以及是否正在对 CoreNLP 软件执行/已执行某些操作以消除此问题。如果有来自 CoreNLP 的解决方案，我需要更改哪些文件，这些文件位于 CoreNLP 框架中的什么位置，以及我需要进行哪些更改？

添加信息(每@StanfordNLPHelp条评论)：

我的模型是使用此处找到的说明进行训练的。为了训练模型，我使用了说明中概述的 TSV，其中包含来自大约 90 个文档的文本。我知道这不是大量的数据来训练，但我们只是处于测试阶段，随着我们获得更多数据，我们将改进模型。

使用此TSV文件和Standford-NER软件，我运行了以下命令。

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop

然后，我构建了我的模型，甚至能够使用斯坦福-NER软件附带的ner GUI加载并成功标记更大的文本语料库。

在排除故障时，为什么我无法使模型工作，我还尝试使用CoreNLP中标准的"3类模型"的文件路径更新我的server.properties文件。它再次失败并出现相同的错误。

事实上，我的自定义模型和 3 类模型都可以在 Stanford-NER 软件中工作但无法加载，这让我相信我的自定义模型不是问题所在，并且 CoreNLP 软件如何通过-serverProperties参数加载这些模型存在一些问题。或者这可能是我完全不知道的事情。

我用于训练 NER 模型的属性文件与方向中的 on 类似，但训练文件已更改，输出文件名已更改。它看起来像这样：

# location of the training file
trainFile = custom-model-trainingfile.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = custome-ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only 
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

我的服务器属性文件只包含一行ner.model = /path/to/custom_model.ser.gz

我还在启动脚本中的 $CLASSPATH 变量中添加了/path/to/custom_model。将行CLASSPATH="$CLASSPATH:$JAR更改为CLASSPATH="$CLASSPATH:$JAR:/path/to/custom_model.ser.gz。我不确定这是否是必要的步骤，因为我首先收到 ZLIB 错误的提示。只是想包括这个完整性。

尝试使用命令gunzip custom_model.ser.gz"gunzip"我的自定义模型，并遇到我在尝试加载模型时遇到的类似错误。这是gzip: custom_model.ser.gz: unexpected end of file

我假设你下载了斯坦福CoreNLP 3.7.0，并在某个地方有一个名为stanford-corenlp-full-2016-10-31的文件夹。为了这个例子，让我们假设它在/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31(根据你的具体情况进行更改)

另外，为了澄清，当您运行Java程序时，它会在CLASSPATH中查找已编译的代码和资源。设置CLASSPATH的常用方法是使用export命令设置CLASSPATH环境变量。

通常，Java 编译的代码和资源存储在 jar 文件中。

如果你看stanford-corenlp-full-2016-10-31你会看到一堆.jar文件。其中之一称为stanford-corenlp-3.7.0-models.jar. 您可以使用以下命令查看 jar 文件中的内容：jar tf stanford-corenlp-3.7.0-models.jar.

当您查看该文件内部时，您会注意到有(除其他外)各种 ner 模型。例如，您应该看到以下文件：

edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz

在模型罐中。

因此，我们让事情正常工作的一种合理方法是运行服务器并告诉它只加载 1 个模型(因为默认情况下它将加载 3 个)。

在一个窗口中运行这些命令(与文件 ner-server.Properties 位于同一目录中)

export CLASSPATH=/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31/*:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties

ner-server.properties是一个包含以下2行的2行文件：

annotators = tokenize,ssplit,pos,lemma,ner
ner.model = edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz

上面的export命令是将该目录中的每个 jar 放在CLASSPATH上。这就是*的意思。所以stanford-corenlp-3.7.0-models.jar应该在CLASSPATH. 因此，当Java代码运行时，它将能够找到edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz。

在不同的终端窗口中，发出以下命令：

wget --post-data 'Joe Smith lives in Hawaii.' 'localhost:9000/?properties={"outputFormat":"json"}' -O -

当它运行时，您应该在第一个窗口(服务器运行的位置)中看到只有此模型正在加载edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz。

您应该注意，如果您从文件中删除ner.model并重新执行所有这些操作，将加载 3 个模型而不是 1 个模型。

请让我知道这一切是否有效。

假设我做了一个名为custom_model.ser.gz的NER模型，该文件是斯坦福核心NLP在训练过程后输出的文件。假设我把它放在文件夹中/Users/stanfordnlphelp/.

如果步骤 1 和 2 有效，您应该能够更改ner-server.properties以下内容：

annotators = tokenize,ssplit,pos,lemma,ner
ner.model = /Users/stanfordnlphelp/custom_model.ser.gz

当您执行相同的操作时，它将显示您的自定义模型加载。不应该有任何类型的 gzip 问题。如果您仍然遇到 gzip 问题，请让我知道您在哪种系统上运行它？ Mac OS X， Unix， Windows， etc...？

为了确认，您说您已经使用独立的斯坦福NER软件运行了自定义NER模型，对吗？如果是这样，听起来模型文件很好。

加载 NER 分类器时出错 - ZLIB 输入流意外结束

相关内容

最新更新

热门标签：