SIGHUP信号触发了Kafka Broker的自动关闭



在我们的3个代理Kafka集群中,其中一个Kafka代理被自动关闭。这恰好是一种罕见的情况,因为同一个集群运行时没有任何问题。

根据Kafka server.log,这个特定的代理收到了一个SIGHUP信号,该信号触发了自动关闭。

重新启动broker后,它运行时没有出现问题。触发SIGHUP信号的问题可能是什么?

以下是kafka server.log 的片段

2020-03-18 14:01:07,570] INFO [ProducerStateManager partition=RC-60-16] Writing producer snapshot at offset 9176994 (kafka.log.ProducerStateManager)
[2020-03-18 14:01:07,572] INFO [Log partition=RC-60-16, dir=/datafs3/kafka-logs] Rolled new log segment at offset 9176994 in 3 ms. (kafka.log.Log)
[2020-03-18 14:03:21,723] INFO Terminating process due to signal SIGHUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2020-03-18 14:03:21,729] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)
[2020-03-18 14:03:21,731] INFO [KafkaServer id=1] Starting controlled shutdown (kafka.server.KafkaServer)
[2020-03-18 14:03:21,796] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(AL-60-0, FL-60-2, WL-60-55, WL-60-37, FL-60-47, RC-60-52, WL-60-19, FL-60-29, RC-60-34, FL-60-11, RC-60-16, FL-60-56, AL-60-42, CL-60-47, FL-60-38, AL-60-24, RC-60-43, CL-60-29, AL-60-6, RC-60-25, CL-60-11, RC-60-7, AL-60-51, WL-60-43, CL-60-56, AL-60-33, WL-60-25, CL-60-38, AL-60-15, WL-60-7, CL-60-20, WL-60-52, CL-60-2, WL-60-34, FL-60-44, RC-60-49, WL-60-16, FL-60-26, RC-60-31, FL-60-8, RC-60-13, FL-60-53, RC-60-58, FL-60-35, AL-60-21, RC-60-40, CL-60-26, FL-60-17, AL-60-3, RC-60-22, CL-60-8, RC-60-4, AL-60-48, CL-60-53, AL-60-30, WL-60-22, CL-60-35, AL-60-12, WL-60-4, CL-60-17, AL-60-57, WL-60-49, AL-60-39, WL-60-31, FL-60-41, CL-60-44, WL-60-13, FL-60-23, RC-60-28, FL-60-5, RC-60-10, WL-60-58, WL-60-40, FL-60-50, RC-60-55, FL-60-32, RC-60-37, FL-60-14, RC-60-19, CL-60-5, RC-60-1, FL-60-59, AL-60-45, CL-60-50, AL-60-27, RC-60-46, CL-60-32, AL-60-9, WL-60-1, CL-60-14, AL-60-54, WL-60-46, CL-60-59, AL-60-36, WL-60-28, CL-60-41, AL-60-18, WL-60-10, FL-60-20, CL-60-23) (kafka.server.ReplicaFetcherManager)
[2020-03-18 14:03:21,839] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(RC-60-20, WL-60-11, FL-60-15, WL-60-17, CL-60-45, CL-60-42, FL-60-6, FL-60-18, AL-60-46, AL-60-13, AL-60-20, AL-60-11, AL-60-16, WL-60-47, CL-60-36, RC-60-3, FL-60-36, AL-60-40, FL-60-45, FL-60-52, CL-60-21, WL-60-41, RC-60-53, FL-60-22, CL-60-9, RC-60-15, RC-60-59, WL-60-2, AL-60-17, CL-60-51, CL-60-1, AL-60-7, CL-60-55, FL-60-1, WL-60-18, CL-60-39, AL-60-58, FL-60-40, FL-60-58, WL-60-21, WL-60-29, RC-60-33, FL-60-49, FL-60-3, AL-60-19, FL-60-7, RC-60-21, WL-60-45, CL-60-4, CL-60-28, CL-60-25, AL-60-22, CL-60-12, RC-60-0, WL-60-6, FL-60-19, RC-60-11, RC-60-44, CL-60-57, CL-60-7, RC-60-5, WL-60-59, RC-60-14, CL-60-22, AL-60-8, RC-60-50, FL-60-46, RC-60-56, FL-60-12, FL-60-16, RC-60-47, AL-60-49, RC-60-26, AL-60-44, AL-60-41, CL-60-40, RC-60-39, WL-60-24, AL-60-31, CL-60-52, CL-60-0, WL-60-54, AL-60-55, AL-60-59, WL-60-0, CL-60-31, AL-60-34, FL-60-37, AL-60-25, AL-60-37, CL-60-18, RC-60-38, WL-60-30, AL-60-1, CL-60-43, FL-60-34, WL-60-50, WL-60-20, FL-60-30, RC-60-54, CL-60-54, WL-60-3, AL-60-50, CL-60-58, WL-60-44, WL-60-9, WL-60-36, AL-60-14, AL-60-29, AL-60-4, AL-60-38, WL-60-27, FL-60-25, RC-60-41, CL-60-49, RC-60-6, FL-60-4, FL-60-0, FL-60-13, FL-60-27, RC-60-8, WL-60-39, RC-60-27, CL-60-13, FL-60-9, AL-60-2, FL-60-43, FL-60-54, RC-60-17, RC-60-45, FL-60-24, AL-60-56, CL-60-48, AL-60-28, WL-60-33, WL-60-15, AL-60-26, WL-60-5, AL-60-23, RC-60-35, ML-1-0, WL-60-23, AL-60-35, FL-60-31, RC-60-23, WL-60-35, WL-60-57, FL-60-42, RC-60-57, RC-60-51, CL-60-19, CL-60-46, WL-60-14, CL-60-27, CL-60-6, RC-60-29, WL-60-51, WL-60-26, RC-60-36, RC-60-12, RC-60-2, CL-60-37, AL-60-32, CL-60-33, RC-60-9, WL-60-12, AL-60-5, WL-60-42, WL-60-56, AL-60-10, FL-60-33, CL-60-34, CL-60-30, RC-60-18, FL-60-10, FL-60-51, AL-60-53, WL-60-53, FL-60-28, AL-60-52, RC-60-42, AL-60-47, CL-60-15, WL-60-48, RC-60-24, WL-60-32, RC-60-48, CL-60-10, RC-60-32, CL-60-24, CL-60-16, AL-60-43, FL-60-39, CL-60-3, FL-60-21, WL-60-8, WL-60-38, FL-60-48, FL-60-55, RC-60-30, FL-60-57) (kafka.server.ReplicaFetcherManager)
[2020-03-18 14:03:21,839] INFO [ReplicaAlterLogDirsManager on broker 1] Removed fetcher for partitions Set(RC-60-20, WL-60-11, FL-60-15, WL-60-17, CL-60-45, CL-60-42, FL-60-6, FL-60-18, AL-60-46, AL-60-13, AL-60-20, AL-60-11, AL-60-16, WL-60-47, CL-60-36, RC-60-3, FL-60-36, AL-60-40, FL-60-45, FL-60-52, CL-60-21, WL-60-41, RC-60-53, FL-60-22, CL-60-9, RC-60-15, RC-60-59, WL-60-2, AL-60-17, CL-60-51, CL-60-1, AL-60-7, CL-60-55, FL-60-1, WL-60-18, CL-60-39, AL-60-58, FL-60-40, FL-60-58, WL-60-21, WL-60-29, RC-60-33, FL-60-49, FL-60-3, AL-60-19, FL-60-7, RC-60-21, WL-60-45, CL-60-4, CL-60-28, CL-60-25, AL-60-22, CL-60-12, RC-60-0, WL-60-6, FL-60-19, RC-60-11, RC-60-44, CL-60-57, CL-60-7, RC-60-5, WL-60-59, RC-60-14, CL-60-22, AL-60-8, RC-60-50, FL-60-46, RC-60-56, FL-60-12, FL-60-16, RC-60-47, AL-60-49, RC-60-26, AL-60-44, AL-60-41, CL-60-40, RC-60-39, WL-60-24, AL-60-31, CL-60-52, CL-60-0, WL-60-54, AL-60-55, AL-60-59, WL-60-0, CL-60-31, AL-60-34, FL-60-37, AL-60-25, AL-60-37, CL-60-18, RC-60-38, WL-60-30, AL-60-1, CL-60-43, FL-60-34, WL-60-50, WL-60-20, FL-60-30, RC-60-54, CL-60-54, WL-60-3, AL-60-50, CL-60-58, WL-60-44, WL-60-9, WL-60-36, AL-60-14, AL-60-29, AL-60-4, AL-60-38, WL-60-27, FL-60-25, RC-60-41, CL-60-49, RC-60-6, FL-60-4, FL-60-0, FL-60-13, FL-60-27, RC-60-8, WL-60-39, RC-60-27, CL-60-13, FL-60-9, AL-60-2, FL-60-43, FL-60-54, RC-60-17, RC-60-45, FL-60-24, AL-60-56, CL-60-48, AL-60-28, WL-60-33, WL-60-15, AL-60-26, WL-60-5, AL-60-23, RC-60-35, ML-1-0, WL-60-23, AL-60-35, FL-60-31, RC-60-23, WL-60-35, WL-60-57, FL-60-42, RC-60-57, RC-60-51, CL-60-19, CL-60-46, WL-60-14, CL-60-27, CL-60-6, RC-60-29, WL-60-51, WL-60-26, RC-60-36, RC-60-12, RC-60-2, CL-60-37, AL-60-32, CL-60-33, RC-60-9, WL-60-12, AL-60-5, WL-60-42, WL-60-56, AL-60-10, FL-60-33, CL-60-34, CL-60-30, RC-60-18, FL-60-10, FL-60-51, AL-60-53, WL-60-53, FL-60-28, AL-60-52, RC-60-42, AL-60-47, CL-60-15, WL-60-48, RC-60-24, WL-60-32, RC-60-48, CL-60-10, RC-60-32, CL-60-24, CL-60-16, AL-60-43, FL-60-39, CL-60-3, FL-60-21, WL-60-8, WL-60-38, FL-60-48, FL-60-55, RC-60-30, FL-60-57) (kafka.server.ReplicaAlterLogDirsManager)
[2020-03-18 14:03:21,848] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Shutting down (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,850] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=844786059, epoch=197425) to node 2: java.io.IOException: Client was shutdown before response was read. (org.apache.kafka.clients.FetchSessionHandler)
[2020-03-18 14:03:21,851] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Shutdown completed (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,851] INFO [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,855] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Shutting down (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,855] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=157379546, epoch=198831) to node 3: java.io.IOException: Client was shutdown before response was read. (org.apache.kafka.clients.FetchSessionHandler)
[2020-03-18 14:03:21,855] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,855] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Shutdown completed (kafka.server.ReplicaFetcherThread)
[2020-03-18 14:03:21,859] INFO [KafkaServer id=1] Controlled shutdown succeeded (kafka.server.KafkaServer)

在正常情况下,kafka服务使用cron命令启动,如下所示:

10 06 * * 1,2,3,4,5 . $HOME/.bashrc; bash $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties > /dev/null 2>&1 & 

但是,在这种情况下,服务是使用上面相同的命令手动运行的,没有"nohup"。这是否会导致触发SIGUP的控制终端关闭?

来自维基百科:

在符合POSIX的平台上,SIGHUP("信号挂起"(是当进程的控制终端关闭时发送给进程的信号。(它最初设计用于通知串行线路中断的进程。(SIGHUP是头文件信号.h中定义的符号常量。

所以有东西在告诉Kafka进程关闭。如果它作为系统服务运行,那么整个系统可能正在重新启动?您可能需要检查计算机的系统日志,以了解是否已触发重新启动。


编辑:是的,您描述的场景(服务是手动运行的,没有nohup(正是导致这种情况的原因。nohup的意思是:当这个shell结束时,不要发送SIGHUP。否则,将发送一个SIGHUP,进程将关闭。

最新更新