在 Cassandra 2.1.7 中检测到错误泄漏



我一直在使用 4 个节点 Cassandra 集群,复制因子为 2,每个节点的 Cassandra 数据大小约为 2.7TB。

3天前,Cassandra

节点之一崩溃了,我正在尝试启动Cassandra服务并查看系统.log,我在多个CF

-
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,779 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@565b5b35) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@408212172:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171 was not released before the reference was garbage collected
LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3e2430d) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@554817289:[Memory@[0..4), Memory@[0..18)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,787 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2ff9f824) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@1142037527:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@35c52c94) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@603046944:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@14834365) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@901621352:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171-Index.db was not released before the reference was garbage collected

阅读了多个关于"检测到泄漏"的链接或博客,有些人说这是一个很长的 GC 问题,然后我把它放在 cassandra-env.sh 文件中的行下面

JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"

在那次检查系统之后.log我在日志中的行下方找到了我 -

INFO  [CompactionExecutor:4] 2017-05-12 19:16:16,892 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29601 ms
INFO  [CompactionExecutor:7] 2017-05-12 23:16:16,563 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29604 ms
INFO  [CompactionExecutor:10] 2017-05-13 03:16:16,838 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29875 ms
INFO  [CompactionExecutor:13] 2017-05-13 07:16:16,849 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29891 ms
INFO  [CompactionExecutor:16] 2017-05-13 11:16:16,737 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29779 ms
INFO  [CompactionExecutor:19] 2017-05-13 15:16:16,848 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29889 ms
INFO  [CompactionExecutor:22] 2017-05-13 19:16:17,009 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29729 ms
INFO  [CompactionExecutor:25] 2017-05-13 23:16:16,476 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29514 ms
INFO  [CompactionExecutor:28] 2017-05-14 03:16:16,648 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29685 ms
INFO  [CompactionExecutor:31] 2017-05-14 07:16:16,724 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29760 ms
INFO  [CompactionExecutor:34] 2017-05-14 11:16:16,709 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29715 ms
INFO  [CompactionExecutor:37] 2017-05-14 15:16:16,515 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO  [CompactionExecutor:40] 2017-05-14 19:16:16,745 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29776 ms
INFO  [CompactionExecutor:43] 2017-05-14 23:16:16,504 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29532 ms
INFO  [CompactionExecutor:46] 2017-05-15 03:16:16,470 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29496 ms
INFO  [CompactionExecutor:49] 2017-05-15 07:16:16,519 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO  [CompactionExecutor:52] 2017-05-15 11:16:16,385 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29411 ms

3天后,Cassandra服务没有启动。请帮助我解决此问题。

系统信息 -

Cassandra Version = 2.1.7
OS  =  Ubuntu 12.04
CPU Core = 4
RAM = 28GB

> 2.1.7 已经很老了,这可能是 2.1.9 ( https://issues.apache.org/jira/browse/CASSANDRA-9998 中修复的已知问题。虽然它本身是无害的,但 2.1.7 中有许多您可能不想遇到的错误 - 您应该考虑升级到最新的 2.1 版本 (2.1.17(。

关于CompactionExecutor/AutoSavingCache的消息并不表示存在问题 - 表示您的缓存(其中包含915k项(正在定期保存到磁盘,这通常表明您的服务器运行正常。

简而言之 - 这些都不表示存在会导致您的 Cassandra 服务器停止为请求提供服务的问题。如果您的服务器没有行为,则可能还有其他事情发生。

最新更新