在kubeadm工作节点上运行的pod的名称解析中出现临时故障



我在VMWare上的Kubernetes集群中使用一个ControlPlane和一个工作节点运行Kafka。从ControlPlane节点,我的客户端可以与Kafka通信,但从我的工作节点,这最终导致了这个错误

%3|1638529687.405|FAIL|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap]: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
%3|1638529687.406|ERROR|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:app]: apollo-prototype-765f4d8bcf-bjpf4#producer-2: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)

这是我的Kafka集群清单(使用Strimzi(

listeners:
- name: plain
port: 9092
type: internal
tls: false
authentication:
type: scram-sha-512
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: scram-sha-512
configuration:
class: nginx
bootstrap:
host: localb.kafka.xxx.com
brokers:
- broker: 0
host: local.kafka.xxx.com

需要指出的是,当我在云中运行时,完全相同的配置可以完美地工作。

Telnetnslookup(来自两个节点(会抛出错误。CoreDNS日志甚至没有提到这个错误。此外,在两个节点上都禁用了防火墙。

你能帮我一下吗?谢谢


更新:解决方案Calico Pod(来自工作节点(抱怨鸟:Netlink:网络关闭,即使它没有崩溃

2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down

这是我所做的,它就像一个魅力!

故障是由节点加载的不同ipv模块引起的。我为新节点配置了ipip模块,但旧节点配置了没有加载ipip模块,这导致了calico异常。删去ipip模块恢复正常。

[root@k8s-node236-232 ~]# lsmod  | grep ipip
ipip                   16384  0 
tunnel4                16384  1 ipip
ip_tunnel              24576  1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod  | grep ipip

Calico Pod(来自工作节点(抱怨bird:Netlink:Network is down,即使它没有崩溃

2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down

这是我所做的,它就像一个魅力!

故障是由节点加载的不同ipv模块引起的。我为新节点配置了ipip模块,但旧节点配置了没有加载ipip模块,这导致了calico异常。删去ipip模块恢复正常。

[root@k8s-node236-232 ~]# lsmod  | grep ipip
ipip                   16384  0 
tunnel4                16384  1 ipip
ip_tunnel              24576  1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod  | grep ipip

最新更新