Flink KafkaSink启动时间长



在使用KafkaSink时,我的Flink应用程序有一个令人讨厌的行为。如果我的应用程序包含一个到Kafka的Sink (EXACTLY_ONCE交付),它需要很长时间才能启动,如果我删除Kafka Sink(留下其他的)或用print替换它,那么应用程序只需要几秒钟就可以启动。在任务管理器日志中,我看到成千上万的重复行,如下所示:

2023-04-05 14:01:25,828 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14788 with epoch 9
2023-04-05 14:01:25,828 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:25,829 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:25,932 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15843 with epoch 8
2023-04-05 14:01:25,932 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:25,933 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:26,035 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16840 with epoch 7
2023-04-05 14:01:26,035 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,036 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:26,139 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14813 with epoch 6
2023-04-05 14:01:26,139 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,140 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:26,244 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16858 with epoch 5
2023-04-05 14:01:26,244 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,245 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:26,348 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14832 with epoch 4
2023-04-05 14:01:26,348 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,349 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:26,451 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15886 with epoch 3
2023-04-05 14:01:26,452 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,453 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:26,555 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16891 with epoch 2
2023-04-05 14:01:26,555 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,556 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:26,659 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14860 with epoch 1
2023-04-05 14:01:26,660 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,660 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:26,766 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15917 with epoch 0
2023-04-05 14:01:26,767 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,767 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:26,870 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14713 with epoch 23
2023-04-05 14:01:26,870 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,871 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:26,974 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15752 with epoch 22
2023-04-05 14:01:26,974 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:26,975 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:27,077 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16751 with epoch 21
2023-04-05 14:01:27,077 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,077 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:27,180 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14724 with epoch 20
2023-04-05 14:01:27,180 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,181 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:27,284 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15768 with epoch 19
2023-04-05 14:01:27,284 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,285 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:27,387 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16765 with epoch 18
2023-04-05 14:01:27,387 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,388 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:27,492 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14737 with epoch 17
2023-04-05 14:01:27,492 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,493 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:27,596 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15785 with epoch 16
2023-04-05 14:01:27,596 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,599 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:27,702 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16782 with epoch 15
2023-04-05 14:01:27,702 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,703 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:27,815 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15798 with epoch 14
2023-04-05 14:01:27,815 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,816 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:27,919 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14759 with epoch 13
2023-04-05 14:01:27,919 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:27,920 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:28,022 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16803 with epoch 12
2023-04-05 14:01:28,023 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,024 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:28,126 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15820 with epoch 11
2023-04-05 14:01:28,126 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,127 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:28,230 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14782 with epoch 10
2023-04-05 14:01:28,230 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,231 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:28,333 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16825 with epoch 9
2023-04-05 14:01:28,333 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,334 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:28,436 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15844 with epoch 8
2023-04-05 14:01:28,436 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,437 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:28,542 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14806 with epoch 7
2023-04-05 14:01:28,543 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,544 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:28,646 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16851 with epoch 6
2023-04-05 14:01:28,646 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,647 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:28,749 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15869 with epoch 5
2023-04-05 14:01:28,749 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,751 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:28,853 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16868 with epoch 4
2023-04-05 14:01:28,853 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,854 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:28,956 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15887 with epoch 3
2023-04-05 14:01:28,956 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:28,957 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:29,060 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14846 with epoch 2
2023-04-05 14:01:29,060 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,061 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:29,163 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16900 with epoch 1
2023-04-05 14:01:29,163 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,163 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:29,268 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15918 with epoch 0
2023-04-05 14:01:29,268 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,269 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:29,374 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16748 with epoch 22
2023-04-05 14:01:29,374 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,375 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:29,478 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14721 with epoch 21
2023-04-05 14:01:29,478 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,479 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:29,582 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15763 with epoch 20
2023-04-05 14:01:29,582 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,582 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)
2023-04-05 14:01:29,684 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 16761 with epoch 19
2023-04-05 14:01:29,685 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,685 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-1.mykafkacluster.amazonaws.com:9096 (id: 1 rack: null)
2023-04-05 14:01:29,788 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 14734 with epoch 18
2023-04-05 14:01:29,788 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,790 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-2.mykafkacluster.amazonaws.com:9096 (id: 2 rack: null)
2023-04-05 14:01:29,895 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] ProducerId set to 15779 with epoch 17
2023-04-05 14:01:29,895 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Invoking InitProducerId for the first time in order to acquire a producer ID
2023-04-05 14:01:29,896 INFO  org.apache.kafka.clients.producer.internals.TransactionManager [] - [Producer clientId=cid, transactionalId=kafka-sink-0-1] Discovered transaction coordinator b-3.mykafkacluster.amazonaws.com:9096 (id: 3 rack: null)

重复20分钟,有时甚至更长…

我试着移除水槽,应用程序在几秒钟内启动,但我正在失去输出。我还尝试将Sink交付保证更改为AT_LEAST_ONCE,它似乎更快(~1分钟),但仍然有很多这些日志…

这是Sink配置(与认证相关的一些道具被隐藏并加载到appConfiguration对象中)

KafkaSink deviceDaySink = KafkaSink.<DeviceDayTimeTuple>builder()
.setRecordSerializer(KafkaRecordSerializationSchema.builder()
.setTopic(appConfiguration.getString("device-day")
.setValueSerializationSchema(new SerializeJsonDeviceDayTime())
.build()
)
.setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)
.setKafkaProducerConfig(appConfiguration.getKafkaProps())
.setProperty("transactional.id", "tid")
.setProperty("client.id", "cid")
.setProperty("transaction.timeout.ms", "30000")
.build();

编辑附加信息(Apr 12):现在我们每隔30秒设置一个检查点,这个主题之前已经被同一个应用程序写过了,但是如果我们重新创建这个主题,问题还是一样的。通过kafka的CLI脚本kafka-transactions.sh,我们注意到flink应用程序在启动时打开了很多空事务:

kafka-sink-0-3394   3           18862       Empty               
kafka-sink-0-4122   3           21106       Empty               
kafka-sink-0-6148   3           21793       Empty               
kafka-sink-0-4366   3           21189       Empty               
kafka-sink-0-2584   3           18585       Empty               
kafka-sink-0-2340   3           18501       Empty 
kafka-sink-0-30     3           16458       Empty
kafka-sink-0-4816   3           21341       Empty 
kafka-sink-0-308    3           16553       Empty
kafka-sink-0-2872   3           18682       Empty 
kafka-sink-0-4654   3           21288       Empty 
kafka-sink-0-3600   3           18931       Empty               
kafka-sink-0-4898   3           21369       Empty 
kafka-sink-0-146    3           16498       Empty
kafka-sink-0-5626   3           21616       Empty               
kafka-sink-0-3844   3           21014       Empty  
kafka-sink-0-4492   3           21231       Empty 
kafka-sink-0-272    3           16541       Empty 
kafka-sink-0-5220   3           21478       Empty 
kafka-sink-0-5464   3           21561       Empty  
kafka-sink-0-3682   3           18958       Empty 

这些只是全部的一部分,并且transactionalId和producerID都与日志文件中的匹配。

问题在于Flink如何管理Kafka事务。对于常规生产者,当它想要使用事务时,它将用settransactionalId初始化事务,Kafka将响应producentIdepoch(这是一个从0开始的迭代器)。

这是为了确保只有最新的生产者可以实际输出数据到Kafka。任何具有相同trasactionalIdepoch低于最新分配的生产者都将被隔离,并且不允许产生事务。这也意味着对于给定的生产者(由transactionalId标识)将只有一个交易。

然而,Flink做的事情有点不同。对于每个检查点,它将创建一个新的transactionalId和一个新的事务。该事务的名称将是<transactionalIdPrefix>-<subtask id>-<checkpoint id>。因此,它将生成kafka-sink-0-1,kafka-sink-0-2等交易。

Kafka中有关事务的信息被持久化在主题中,并存储在当前Kafka事务leader缓存中。通常,事务主题的保留时间为1周。因此,使用单个生产者和单个实例的1分钟检查点,将有超过10,000个事务。当然,我们希望它们都处于EmptyCompleteCommited状态。

Flink在Kafka生产者启动时所做的是首先确保没有正在进行的事务。这是在org.apache.flink.connector.kafka.sink.TransactionAborter#abortTransactionOfSubtask完成的。这段代码所做的是天真地从<transactionalIdPrefix>-<subtask id>-<current checkpoint id>开始构建transactionalId,并迭代直到Kafka返回epoch等于0,因为这意味着这是Kafka第一次看到这样的transactionalId。它将执行initTransaction,这使得Kafka用相同的transactionalId刷新之前的事务,并使其保持在Empty状态。

这对于常规操作没有问题,因为当前检查点id是最新的检查点。但是,如果您从上一个检查点开始或从新状态开始,则不会出现这种情况。在这种情况下,您的当前检查点id将是1,因此它将从1开始,并将上升到该集群中创建的最后一个事务。

每次Flink启动时,直到检查点id通过最后一个事务为止。

因为我知道Kafka中没有API允许你删除特定的transationalId。中止它将只留下CompleteAborted。你的选择是:

  1. 每当你想要开始新的Flink状态,你应该改变transactionalIdPrefix。这只会在你清理状态时起作用,因为如果你改变了transactionalIdPrefix, Flink知道前一个是什么,并且无论如何都会扫描这些事务。
  2. 清除Kafka事务状态topic和force leader更改,以便它将从topic刷新缓存。
  3. 从国家和军队领导人的变化中删除特定的交易。

p。另外,如果你在共享Kafka上运行多个Flink本地实例,确保每个人都有唯一的transactionalIdPrefix

当检查点失败时,也可以进行扫描,因为它从上次成功的检查点开始扫描。

我们遇到了同样的问题,并注意到它来自不优雅的关闭。例如,当您使用intelllij在本地运行应用程序并按下停止按钮时,关闭方法不会被调用。不幸的是,只有在运行模式下才能正常关机,而不是在调试模式下....

相关内容

  • 没有找到相关文章

最新更新