大基数条件下的Opengrok优化



我有一个服务器实例,这里有4核和32 GB RAM和Ubuntu 20.04.3 LTS安装。在这台机器上有一个opengrok-instance作为docker容器运行。

在docker容器内部,它使用AdoptOpenJDK:

OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9   - b4cc246d9
OMR      - 162e6f729
JCL      - 7796c80419 based on jdk-11.0.11+9)

opengrok-indexer扫描的代码库大小为320gb,耗时21小时。

我想出来的是,我已经禁用了历史选项,它需要更少的时间。如果设置了历史记录标志,是否有可能减少此时间?

下面是我的索引命令:

opengrok-indexer -J=-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -J=-Djava.util.logging.config.file=/usr/share/tomcat10/conf/logging.properties -J=-XX:-UseGCOverheadLimit -J=-Xmx30G -J=-Xms30G -J=-server -a /var/opengrok/dist/lib/opengrok.jar -- -R /var/opengrok/etc/read-only.xml -m 256 -c /usr/bin/ctags -s /var/opengrok/src/ -d /var/opengrok/data --remote on -H -P -S -G -W /var/opengrok/etc/configuration.xml --progress -v -O on -T 3 --assignTags --search --remote on -i *.so -i *.o -i *.a -i *.class -i *.jar -i *.apk -i *.tar -i *.bz2 -i *.gz -i *.obj -i *.zip"

提前感谢您的帮助。

亲切的问候

齐格弗里德

您应该尝试使用以下选项来增加线程的数量:

--historyThreads number
The number of threads to use for history cache generation on repository level. By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.

--historyFileThreads number
The number of threads to use for history cache generation when dealing with individual files.
By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.
-T, --threads number
The number of threads to use for index generation, repository scan
and repository invalidation.
By default the number of threads will be set to the number of available
CPUs. This influences the number of spawned ctags processes as well.

看看重命名历史"选项。理论上"off"是默认选项,但这对索引时间有很大的影响,所以值得检查:

--renamedHistory on|off
Enable or disable generating history for renamed files.
If set to on, makes history indexing slower for repositories
with lots of renamed files. Default is off.

最新更新