Apache Ignite连续查询缓存事务



我们使用连续查询在所有客户端节点之间传输数据。然而,我们有一个缩放网格,所以我们经常遇到这样的问题:数据节点一直试图连接到客户端,以发送已经缩放的连续查询的数据。这将导致系统停止,因为PME操作无法获得锁,因此拓扑结构无法更新。

为了解决这个问题,我想使用参数TxTimeoutOnPartitionMapExchange,这将允许PME继续。然而,为了利用这个参数,我需要将我的缓存的atomicityMode更改为事务性的吗?如果是,那么数据节点尝试从连续查询中发送数据的过程是否算作事务?

总而言之,我正试图弄清楚TxTimeoutOnPartitionMapExchange参数是否有助于我的连续查询情况,以及启用该参数的步骤。

编辑:我试图解决的问题的堆栈跟踪:

  • 持续尝试保留客户端,我相信它保持全局锁阻塞缓存更新和检查点

:

Deadlock: false
Completed: 1999706
Thread [name="sys-stripe-6-#7%pv-ib-valuation%", id=42, state=WAITING, blockCnt=52537, waitCnt=734400]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3229)
at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013)
at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960)
at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:2100)
at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:2365)
at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1964)
at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1935)
at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1917)
at o.a.i.i.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:1324)
at o.a.i.i.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1261)
at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:1059)
at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:90)
at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:459)
at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447)
at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2495)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2657)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2118)
  • 这在reserveclient调用被击中后开始出现无法获取锁

:

>>> Possible starvation in striped pool.
Thread name: sys-stripe-4-#5%pv-ib-valuation%
Queue: []
Deadlock: false
Completed: 6328076
Thread [name="sys-stripe-4-#5%pv-ib-valuation%", id=40, state=WAITING, blockCnt=111790, waitCnt=2018248]
Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@66d8e343, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1663)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2715)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2679)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1051)
at o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:243)
at o.a.i.i.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:873)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1189)

所以总的来说,我的分析是,到目前为止,如果一个客户端走了,那么连续的查询一直试图连接持有锁阻塞一切。

  • 示例页面锁转储。这是一个类似的页面链接转储每次和所有线程似乎都在等待,而不是锁定

:

Page locks dump:
Thread=[name=checkpoint-runner-#94%pv-ib-valuation%, id=162], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#94%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=checkpoint-runner-#95%pv-ib-valuation%, id=163], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#95%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=checkpoint-runner-#96%pv-ib-valuation%, id=164], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#96%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=checkpoint-runner-#97%pv-ib-valuation%, id=165], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#97%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-0-#15%pv-ib-valuation%, id=50], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-0-#15%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-1-#16%pv-ib-valuation%, id=51], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-1-#16%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-10-#25%pv-ib-valuation%, id=60], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-10-#25%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-11-#26%pv-ib-valuation%, id=61], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-11-#26%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-12-#27%pv-ib-valuation%, id=62], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-12-#27%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-13-#28%pv-ib-valuation%, id=63], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-13-#28%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-14-#29%pv-ib-valuation%, id=64], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-14-#29%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-15-#30%pv-ib-valuation%, id=65], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-15-#30%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-2-#17%pv-ib-valuation%, id=52], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-2-#17%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-3-#18%pv-ib-valuation%, id=53], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-3-#18%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-4-#19%pv-ib-valuation%, id=54], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-4-#19%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-5-#20%pv-ib-valuation%, id=55], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-5-#20%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-6-#21%pv-ib-valuation%, id=56], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-6-#21%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-7-#22%pv-ib-valuation%, id=57], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-7-#22%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-8-#23%pv-ib-valuation%, id=58], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-8-#23%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=data-streamer-stripe-9-#24%pv-ib-valuation%, id=59], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-9-#24%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=db-checkpoint-thread-#93%pv-ib-valuation%, id=161], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=db-checkpoint-thread-#93%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=dms-writer-thread-#77%pv-ib-valuation%, id=145], state=WAITING
Locked pages = []
Locked pages log: name=dms-writer-thread-#77%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)

Thread=[name=exchange-worker-#71%pv-ib-valuation%, id=139], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#71%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)

Thread=[name=lock-cleanup-0, id=278], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-0 time=(1674196038673, 2023-01-20 06:27:18.673)

Thread=[name=lock-cleanup-scheduled-0, id=171], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-scheduled-0 time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=main, id=1], state=WAITING
Locked pages = []
Locked pages log: name=main time=(1674196038673, 2023-01-20 06:27:18.673)

Thread=[name=query-#5729%pv-ib-valuation%, id=6455], state=WAITING
Locked pages = []
Locked pages log: name=query-#5729%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=query-#5730%pv-ib-valuation%, id=6456], state=WAITING
Locked pages = []
Locked pages log: name=query-#5730%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=query-#5735%pv-ib-valuation%, id=6461], state=WAITING
Locked pages = []
Locked pages log: name=query-#5735%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=query-#5736%pv-ib-valuation%, id=6462], state=WAITING
Locked pages = []
Locked pages log: name=query-#5736%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-0-#1%pv-ib-valuation%, id=36], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-0-#1%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-1-#2%pv-ib-valuation%, id=37], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-1-#2%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-10-#11%pv-ib-valuation%, id=46], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-10-#11%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-11-#12%pv-ib-valuation%, id=47], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-11-#12%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-12-#13%pv-ib-valuation%, id=48], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-12-#13%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-13-#14%pv-ib-valuation%, id=49], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-13-#14%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-2-#3%pv-ib-valuation%, id=38], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-2-#3%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-3-#4%pv-ib-valuation%, id=39], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-3-#4%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-4-#5%pv-ib-valuation%, id=40], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-4-#5%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-5-#6%pv-ib-valuation%, id=41], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-5-#6%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-6-#7%pv-ib-valuation%, id=42], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-6-#7%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-7-#8%pv-ib-valuation%, id=43], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-7-#8%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-8-#9%pv-ib-valuation%, id=44], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-8-#9%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=sys-stripe-9-#10%pv-ib-valuation%, id=45], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-9-#10%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

Thread=[name=ttl-cleanup-worker-#62%pv-ib-valuation%, id=127], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=ttl-cleanup-worker-#62%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

TxTimeoutOnPartitionMapExchange是关于回滚活动事务以解锁PME进程的。它不会神奇地解锁每个可能因不同原因而卡住的PME。

当然,在任何情况下都值得配置这个设置。要启用它,需要调整服务器节点的配置,并将该属性设置为某个值,比如30秒。下面是一个XML更改的示例。

说到客户端断开连接的原始CQ问题,我希望Ignite能够自动处理这个问题。换句话说,我不认为PME挂起的问题是由连续查询本身引起的,而是由其他原因引起的,比如没有超时的活动TXs。

您不需要更改缓存的atomicyMode。事务不能应用于非事务性缓存(原子)。

最新更新