如何修复"偏移量提交在偏移量 com.application.iot.measure.stage-0 上失败1053078427:请求超时。



我有一个 kafka 消费者应用程序,它从 kafka 主题读取物联网数据。但我仍然不规则地收到以下错误/警告。

原木

2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053164658: The coordinator is not aware of this member.
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053164658, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053167516, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

我已经尝试了max.poll.records和 max.poll.interval.ms 配置的多种组合。我什至尝试增加 request.timeout.ms,但这些错误和警告仍然没有停止。

注意:我无法控制代理,因此我无法尝试更改 session.timeout.ms 因为它需要在代理的 group.min.session.timeout.ms 和 group.max.session.timeout.ms 配置范围内。

应用程序.yml

spring:
kafka:
consumer:
group-id: iot
auto-offset-reset: earliest
properties:
fetch.max.wait.ms: 10000
fetch.min.bytes: 30000000
retry.backoff.ms: 1000
max.poll.records: 4000000
max.poll.interval.ms: 720000
request.timeout.ms: 900000

目前的行为不稳定如下。

原木

2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053064069: The coordinator is not aware of this member.
2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053064069, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:08:03.82+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:38:03.827  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053066926, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:09:43.04+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:39:43.044  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053064069, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:10:00.13+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:00.130  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 119. No Of measures: 2857
2019-06-04T23:10:12.90+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:12.909  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 125. No Of measures: 2893
2019-06-04T23:10:22.94+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:22.948  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 74. No Of measures: 2880
2019-06-04T23:10:34.44+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:34.445  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 73. No Of measures: 2862
2019-06-04T23:10:50.50+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:50.501  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 73. No Of measures: 2866
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053075561: The coordinator is not aware of this member.
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053075561, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:10:56.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:40:56.086  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Synchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053078427, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:11:03.86+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:03.867 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053078427: The coordinator is not aware of this member.
2019-06-04T23:11:05.50+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:05.506  WARN 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Asynchronous auto-commit of offsets {com.newton.forwarding.application.iot.measure.stage-0=OffsetAndMetadata{offset=1053078427, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-04T23:11:33.74+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:33.743  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 79. No Of measures: 2862
2019-06-04T23:11:45.66+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:45.664  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 109. No Of measures: 2866
2019-06-04T23:11:56.49+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:41:56.492  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 75. No Of measures: 2880
2019-06-04T23:12:08.39+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:08.390  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 90. No Of measures: 2889
2019-06-04T23:12:15.71+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:15.716 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053078427: The request timed out.
2019-06-04T23:12:25.00+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:25.001  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 80. No Of measures: 2880
2019-06-04T23:12:43.71+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:42:43.714  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 97. No Of measures: 2870
2019-06-04T23:13:02.37+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:02.374  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 121. No Of measures: 2868
2019-06-04T23:13:21.72+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:21.724  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 99. No Of measures: 2867
2019-06-04T23:13:42.36+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:43:42.368  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 101. No Of measures: 2860
2019-06-04T23:14:01.73+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:01.737  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 145. No Of measures: 2862
2019-06-04T23:14:19.28+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:19.287  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 118. No Of measures: 2873
2019-06-04T23:14:37.63+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:37.630  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 104. No Of measures: 2866
2019-06-04T23:14:55.88+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:44:55.889  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 117. No Of measures: 2880
2019-06-04T23:15:12.29+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:12.298  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 203. No Of measures: 2880
2019-06-04T23:15:31.48+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:31.480  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 105. No Of measures: 2880
2019-06-04T23:15:51.25+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:45:51.251  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 176. No Of measures: 2880
2019-06-04T23:16:06.69+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:06.692  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 157. No Of measures: 2880
2019-06-04T23:16:23.27+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:23.271  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 110. No Of measures: 2880
2019-06-04T23:16:39.18+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:39.184  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 88. No Of measures: 2880
2019-06-04T23:16:58.28+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:46:58.285  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 108. No Of measures: 2880
2019-06-04T23:17:17.67+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:17.676  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 141. No Of measures: 2885
2019-06-04T23:17:36.67+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:36.669  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 107. No Of measures: 2880
2019-06-04T23:17:53.78+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:47:53.783  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 344. No Of measures: 2855
2019-06-04T23:18:12.35+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:12.351  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 67. No Of measures: 2880
2019-06-04T23:18:29.12+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:29.129  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 109. No Of measures: 2895
2019-06-04T23:18:46.31+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:48:46.313  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 131. No Of measures: 2861
2019-06-04T23:19:03.72+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:03.729  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 116. No Of measures: 2880
2019-06-04T23:19:22.91+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:22.913  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 113. No Of measures: 2867
2019-06-04T23:19:40.83+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:40.832  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 118. No Of measures: 2859
2019-06-04T23:19:58.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:49:58.587  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 106. No Of measures: 2880
2019-06-04T23:20:16.08+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:16.086  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 89. No Of measures: 2880
2019-06-04T23:20:35.23+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:35.239  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 163. No Of measures: 2854
2019-06-04T23:20:55.44+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:50:55.446  INFO 12 --- [ntainer#0-0-C-1] c.s.n.f.a.s.impl.ConsumerServiceImpl     : Time taken(ms) 214. No Of measures: 2858
2019-06-04T23:21:03.58+0530 [APP/PROC/WEB/0] OUT 2019-06-04 17:51:03.583 ERROR 12 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-2, groupId=newton] Offset commit failed on partition com.newton.forwarding.application.iot.measure.stage-0 at offset 1053164658: The coordinator is not aware of this member.

非常感谢解决此问题的任何建议。

PS:我是否应该考虑将消费者侦听器与数据处理与工作线程的处理隔离开来,如本问题中所述?

您需要确定处理 4000000 条记录所需的时间,并将max.poll.interval.ms设置为适当的值。

对于如此大量的记录,只有一个分区是一个糟糕的设计。

最新更新