100% CPU with Tomcat Pool



下面是线程随机导致100%的CPU使用,它只有在节点重新启动时才会解决,需要帮助确定根本原因,我猜与一些导致CPU SPIKE的循环有关

stackTrace:
java.lang.Thread.State: RUNNABLE
at javax.net.ssl.SSLEngineResult.<init>(java.base@11.0.18/SSLEngineResult.java:196)
at sun.security.ssl.SSLEngineImpl.writeRecord(java.base@11.0.18/SSLEngineImpl.java:164)
at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:136)
- eliminated <0x00000000c5c5ebb8> (a sun.security.ssl.SSLEngineImpl)
at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:116)
- locked <0x00000000c5c5ebb8> (a sun.security.ssl.SSLEngineImpl)
at javax.net.ssl.SSLEngine.wrap(java.base@11.0.18/SSLEngine.java:482)
at oracle.net.nt.SSLSocketChannel.shutdown(SSLSocketChannel.java:381)
- locked <0x00000000c5c5e868> (a oracle.net.nt.SSLSocketChannel)
at oracle.net.nt.SSLSocketChannel.write(SSLSocketChannel.java:242)
at oracle.net.ns.NIOPacket.writeToSocketChannel(NIOPacket.java:302)
at oracle.net.ns.NIONSDataChannel.writeDataToSocketChannel(NIONSDataChannel.java:173)
at oracle.net.ns.NIONSDataChannel.writeDataToSocketChannel(NIONSDataChannel.java:124)
at oracle.jdbc.driver.T4CMAREngineNIO.flush(T4CMAREngineNIO.java:727)
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForUnmarshall(T4CMAREngineNIO.java:733)
at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:413)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
at oracle.jdbc.driver.T4C7Ocommoncall.doOCOMMIT(T4C7Ocommoncall.java:72)
at oracle.jdbc.driver.T4CConnection.doCommit(T4CConnection.java:961)
- eliminated <0x00000000c5c59900> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:1937)
- locked <0x00000000c5c59900> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:1942)
at jdk.internal.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:131)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:109)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:109)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy81.commit(Unknown Source)
at org.hibernate.resource.jdbc.internal.AbstractLogicalConnectionImplementor.commit(AbstractLogicalConnectionImplementor.java:86)
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:282)
at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101)
at io.dropwizard.hibernate.SessionFactoryHealthCheck.lambda$check$0(SessionFactoryHealthCheck.java:56)
at io.dropwizard.hibernate.SessionFactoryHealthCheck$$Lambda$797/0x0000000840b16440.call(Unknown Source)
at java.util.concurrent.FutureTask.run(java.base@11.0.18/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18/ThreadPoolExecutor.java:628)
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
at java.lang.Thread.run(java.base@11.0.18/Thread.java:834)
Locked ownable synchronizers:
- <0x00000000c2c9c570> (a java.util.concurrent.ThreadPoolExecutor$Worker)

同样的问题在这里运行Wildfly AS 24.0.1.Final。问题开始于从OpenJDK 11.0.17更新到11.0.18(同时运行标准的操作系统更新)。导致CPU负载的I/O线程的堆栈跟踪是相同的:

java.lang.Thread.State: RUNNABLE
at sun.security.ssl.SSLEngineImpl.writeRecord(java.base@11.0.18/SSLEngineImpl.java:271)
at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:136)
- eliminated <0x0000000740f3c8b8> (a sun.security.ssl.SSLEngineImpl)
at sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:116)
- locked <0x0000000740f3c8b8> (a sun.security.ssl.SSLEngineImpl)
at javax.net.ssl.SSLEngine.wrap(java.base@11.0.18/SSLEngine.java:482)
…

在RHEL 7.9下使用OpenJDK 11.0.18包时出现问题:

java-11-openjdk-11.0.18.0.10-1.el7_9.x86_64.rpm
java-11-openjdk-headless-11.0.18.0.10-1.el7_9.x86_64.rpm
java-11-openjdk-devel-11.0.18.0.10-1.el7_9.x86_64.rpm

解决方案:升级到OpenJDK 11.0.17。在RedHat Linux上,你可以这样做,避免自动更新Java包:

yum downgrade java-11-openjdk-11.0.17.0.8-2.el7_9.x86_64 java-11-openjdk-headless-11.0.17.0.8-2.el7_9.x86_64 java-11-openjdk-devel-11.0.17.0.8-2.el7_9.x86_64
yum install yum-plugin-versionlock
yum versionlock java-11-openjdk-*

根本原因仍然未知,也许这与https://bugs.openjdk.org/browse/JDK-8273553有关,但我无法确认这一点。

:在我的例子中,这似乎是Undertow: https://issues.redhat.com/browse/UNDERTOW-2239中的一个确认错误。它与JDK 1.8.0_361+、11.0.18+或17.0.6+相关(参见Oracle支持文档2934851.1)。因此,坚持使用JDK 11.0.17(或者禁用SSL)似乎是目前唯一的选择。

本周我们的服务器(java 17, spring boot,…)和SQL server数据库也遇到了完全相同的问题。

通过禁用TLS 1.3解决了这个问题,但我们还没有进一步调查找到根本原因(如果再次启用TLS 1.3就好了)。

似乎与此有关:https://jira.atlassian.com/browse/JRASERVER-70169

最新更新