由于绑定()到0.0.0.0:8443失败(98:地址在使用中),导致仅Fargate EKS集群上的ingress n



我在EKS集群中安装了helm的ingress-nginxpod一直在故障,其日志指示应用程序无法绑定到0.0.0.0:8443(INADDR_ANY:8443(。我已经确认0.0.0.0:8443确实已经绑定在容器中,但bc我还没有对容器的root访问权限,我无法收集到罪魁祸首进程/用户。

我在我正在使用的kubernetes ingress nginx项目上创建了这个问题,但我也想接触更广泛的SO社区,他们可能会为如何克服这个障碍提供见解、解决方案和故障排除建议。

作为AWS/EKS和Kubernetes的新手,很可能是一些环境配置错误导致了这个问题。例如,这可能是由错误配置的AWS机制引起的吗,例如VPC(其子网或安全组(?提前感谢您的帮助!

链接的GitHub问题提供了关于Terraform提供的EKS环境以及安装了Helm的ingress-nginx部署的大量细节。以下是一些关键细节:

  1. EKS集群被配置为仅使用Fargate工作程序,并且有3个公用子网和3个专用子网,所有6个子网都可用于集群及其每个Fargate配置文件
  2. 还应该注意的是,这个集群是新的,ingress nginx pod是第一次尝试将任何东西部署到集群,除了像coredns这样的kube系统项目,它已经被配置为在Fargate中运行。(需要手动删除此处所述的默认ec2注释(
  3. 有6个fargate配置文件,但目前只有2个在使用:corednsingress。它们分别专用于kube system/kube-dns和ingress nginx。除了选择器的名称空间和标签之外;自定义";关于配置文件规范。已经确认选择器正在工作,包括coredns和ingress。也就是说,ingress pod计划运行,但失败了
  4. ingress-nginx使用端口8443的原因是,我第一次遇到权限提升问题,该问题的解决方法需要禁用allowPrivilegeEscalation并将端口从特权端口更改为非特权端口。我使用以下值调用helm install
controller: 
extraArgs: 
http-port: 8080 
https-port: 8443 
containerPort: 
http: 8080 
https: 8443 
service: 
ports: 
http: 80 
https: 443 
targetPorts: 
http: 8080 
https: 8443 
image: 
allowPrivilegeEscalation: false
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes
livenessProbe:
initialDelaySeconds: 60  # 30
readinessProbe:
initialDelaySeconds: 60  # 0
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
  1. 由于我最初的观察结果(在查看日志之前(是K8s活动性/就绪性探针出现故障/超时,我首先尝试在传递给helm安装的值中扩展它们的initialDelaySeconds。但最终我查看了pod/容器日志,发现无论*ness探测器设置如何,每次我重新安装ingress-nginxpod并等待一段时间时,日志都会指示此处报告的相同绑定错误:
2021/11/12 17:15:02 [emerg] 27#27: bind() to [::]:8443 failed (98: Address in use)
.
.```
6. Aside from what I've noted above, I haven't intentionally configured anything "non-stock". I'm a bit lost in AWS/K8s's sea of configuration looking for what piece I need to adapt/correct.
Do you have clues or guesses why INADDR_ANY, port 8443 would already be bound in my (fairly-standard) `nginx-ingress-ingress-nginx-controller` pod/container?
As I aluded earlier, I am able to execute `netstat` command inside the running container as default user `www-data` to confirm indeed 0:8443 is already bound, but because I haven't yet figured out how to get root access, the PID/name of the processes are not available to me:
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:10245         0.0.0.0:*               LISTEN      -
tcp        3      0 127.0.0.1:10246         0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:10247         0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8181            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8181            0.0.0.0:*               LISTEN      -
tcp        0      0 :::8443                 :::*                    LISTEN      -
tcp        0      0 :::10254                :::*                    LISTEN      -
tcp        0      0 :::8080                 :::*                    LISTEN      -
tcp        0      0 :::8080                 :::*                    LISTEN      -
tcp        0      0 :::8181                 :::*                    LISTEN      -
tcp        0      0 :::8181                 :::*                    LISTEN      -```
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- /bin/bash
bash-5.1$ whoami
www-data
bash-5.1$ ps aux
PID   USER     TIME  COMMAND
1 www-data  0:00 /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx
8 www-data  0:00 /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --configmap=ingress/n
28 www-data  0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf
30 www-data  0:00 nginx: worker process
45 www-data  0:00 /bin/bash
56 www-data  0:00 ps aux```
I'm currently looking into how to get root access to my Fargate containers (without mucking about with their Dockerfiles to install ssh..) so I can get more insight into what process/user is binding INADDR_ANY:8443 in this pod/container.

基于相同主题和类似问题发布的社区wiki答案(均在GitHub页面上(。请随意扩展。


GitHub的答案:

问题是8443已经绑定到webhook。这就是为什么我在建议中使用8081,而不是8443。这里使用8443的示例还必须移动webhook,这给更改带来了更大的复杂性,如果你弄错了,可能会导致奇怪的问题。

使用8081端口的示例:

除了这些设置外,您还需要使用适当的注释来使用NLB而不是ELB运行,所以最终看起来像

controller:
extraArgs:
http-port: 8080
https-port: 8081
containerPort:
http: 8080
https: 8081
image:
allowPrivilegeEscalation: false
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb-ip"

编辑:修复了aws负载均衡器类型为nlb-ip的问题,因为这是Fargate所必需的。可能应该是

service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"

对于当前版本的AWS负载均衡器控制器(2.2版以后(,但新版本将识别nlb-ip注释

最新更新