点燃节点车棚[ttl-cleanup-worker]



我有 Ignite 2.7 和 5 节点集群。超过 40Mil 的数据正在生成并存储在 ignite 缓存中。我设置了 3 天到期日。今天,其中一个点燃节点停止并显示以下错误。请帮助我识别并解决问题。

[

2019-09-11 07:45:59,570][错误][ttl-cleanup-worker-#170][root] 检测到严重系统错误。将根据配置的处理程序 [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: 未知页面类型: 1 pageId: 000102210006d4ac]] java.lang.IllegalStateException: 未知页面类型: 1 pageId: 000102210006d4ac 在 org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058( at org.apache.ignite.internal.processor.cache.persistence.tree.BPlusTree.access$200(BPlusTree.java:90( at org.apache.ignite.internal.processor.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5330( at org.apache.ignite.internal.processor.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566( at org.apache.ignite.internal.processor.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232( at org.apache.ignite.internal.processor.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157( at org.apache.ignite.internal.processor.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845( at org.apache.ignite.internal.processor.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207( at org.apache.ignite.internal.processor.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139( at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120( at java.lang.Thread.run(Thread.java:748( [2019-09-11 07:45:59,575][警告][ttl-cleanup-worker-#170][故障处理器]未检测到死锁线程。 [2019-09-11 07:46:40,831][警告][jvm-pause-detector-worker][点燃内核]可能太长的 JVM 暂停:41233 毫秒。 [2019-09-11 07:46:40,831][错误][系统条带-0-#1][G] 检测到阻塞的系统关键线程。这可能会导致集群范围的未定义行为 [threadName=gri d-nio-worker-tcp-comm-23, blockedFor=41s] [2019-09-11 07:46:40,832][警告][系统条带-0-#1][G] 线程 [name="grid-nio-worker-tcp-comm-23-#143", id=173, state=RUNNABLE, blockCnt=0, waitCnt=0]

如果点燃的配置是,

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling native persistance-->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="metricsEnabled" value="true"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
<property name="storagePath" value="/ignite_data/ignite/persistance"/>
<property name="walPath" value="/ignite_data/ignite/wal"/>
<property name="walArchivePath" value="/data/disk01/ignite/archive"/>
</bean>
</property>
<!-- Enable authentication for ignite-->
		<property name="authenticationEnabled" value="true"/>
<!-- Enabling expiry policy -->
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="3"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
</list>
</property>
<!-- Enable Ignite matric logged into logs in every 10 min-->
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/home/trigger_be/apache-ignite-2.7.0/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="#{60 * 10 * 1000}"/>
<!-- Set Cluster by giving IPs-->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>172.16.5.36:49500..49509</value>
<value>172.16.5.37:49500..49509</value>
<value>172.16.5.38:49500..49509</value>
<value>172.16.5.39:49500..49509</value>
				                <value>172.16.5.40:49500..49509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>

这看起来像是数据损坏问题。建议从此节点中完全删除持久性数据,并将其重新添加到群集的基线拓扑中。然后,只要您有足够的备份,数据将被重新平衡。

这看起来有点像问题 IGNITE-10767。您是否启用了 MVCC(事务性 SQL,TRANSACTIONAL_SNAPSHOT 缓存(?

最新更新