一个较小的Stanford NLP Models Jar文件

我目前正在为斯坦福NLP模型使用这个JAR文件：Stanford-corelp-3.5.2-models.JAR

这个文件相当大：大约340 MB。

我只使用4种型号：tokenize、ssplit、parse和lemma。我有没有办法使用一个较小的模型JAR文件（或者每个模型都有一个JAR文件），因为我绝对需要这个文件的大小尽可能小

如果您只在类路径中包含解析器的模型文件和pos-tagger的模型文件，那么您应该没问题。"引理"需要"pos"，所以您需要将其包含在注释器列表中。

例如："edu/staford/nlp/models/lexparser/englishPCFG.ser.gz"one_answers"edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3 words-distsim.tagger"应该就是您所需要的全部。

你可以创建这个目录结构并将这些文件包含在类路径中，或者只在其中创建一个jar文件。

最重要的是，如果您丢失了一些东西，您的代码将因丢失资源错误而崩溃。因此，您只需要不断添加文件，直到代码停止崩溃。你们肯定不需要在那个罐子里放很多文件。

按照@StanfordNLPHelp提到的类似方法，我使用了maven shade插件，并减小了最终编译的jar文件的大小。您需要更改"Package.MainClass"和includes标签或添加excludes标签

<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <transformers>
                        <!-- adding Main-Class to manifest file -->
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>Package.MainClass</mainClass>
                        </transformer>
                    </transformers>
                    <minimizeJar>true</minimizeJar>
                    <filters>
                        <filter>
                            <artifact>edu.stanford.nlp:stanford-corenlp</artifact>
                            <includes>
                                <include>**</include>
                            </includes>
                        </filter>
                        <filter>
                            <artifact>edu.stanford.nlp:stanford-corenlp:models</artifact>
                            <includes>
                                <include>edu/stanford/nlp/models/pos-tagger/**</include>
                            </includes>
                        </filter>
                    </filters>
                </configuration>
            </execution>
        </executions>
    </plugin>

根据StanfordNLPHelp的建议，我做了这个（我使用Gradle）：

已从以下位置下载CoreNLP：斯坦福CoreNLP下载
打开stanford-corelp-X-models.jar
go/edu/斯坦福大学/nlp/模型
删除不相关的文件夹。不幸的是，这是一个有点猜测和检查
重新压缩文件夹并将其转换为jar（我只是简单地更改了扩展名，这可能有点不赞成）
在我的gradle项目中添加一个libs文件夹/app/libs
将下载的stanford-corelp-x.jar移到那里，并在上制作新的jar

在build.gradle中添加

  implementation files('libs/stanford-corenlp-4.4.0.jar')
  implementation files('libs/stanford-corenlp-4.4.0-models.jar')

运行渐变构建。如果出现错误，则表示您删除了一个重要文件。还原和重新压缩，等等。

相关内容

最新更新

热门标签：