AKS升级到v1.22后,Nginx入口控制器无法启动



我们执行了kubernetes集群从v1.21到v1.22的升级。在这次操作之后,我们发现我们的nginx入口控制器部署的pod无法启动,并显示以下错误消息:pkg/mod/k8s.io/client-go@v0.18.5/tools/cache/reflector.go:125: Failed to list *v1beta1.Ingress: the server could not find the requested resource

我们已经发现,这个问题在这里得到了跟踪:https://github.com/bitnami/charts/issues/7264

因为azure不允许将集群降级回1.21,你能帮我们修复nginx入口控制器的部署吗?由于我们对helm不太熟悉,请具体说明应该做什么以及从哪里开始(本地机器或azure cli等)。

这是我们当前的部署yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
name: nginx-ingress-controller
namespace: ingress
uid: 575c7699-1fd5-413e-a81d-b183f8822324
resourceVersion: '166482672'
generation: 16
creationTimestamp: '2020-10-10T10:20:07Z'
labels:
app: nginx-ingress
app.kubernetes.io/component: controller
app.kubernetes.io/managed-by: Helm
chart: nginx-ingress-1.41.1
heritage: Helm
release: nginx-ingress
annotations:
deployment.kubernetes.io/revision: '2'
meta.helm.sh/release-name: nginx-ingress
meta.helm.sh/release-namespace: ingress
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:replicas: {}
subresource: scale
- manager: Go-http-client
operation: Update
apiVersion: apps/v1
time: '2020-10-10T10:20:07Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:spec:
f:progressDeadlineSeconds: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:strategy:
f:rollingUpdate:
.: {}
f:maxSurge: {}
f:maxUnavailable: {}
f:type: {}
f:template:
f:metadata:
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/component: {}
f:component: {}
f:release: {}
f:spec:
f:containers:
k:{"name":"nginx-ingress-controller"}:
.: {}
f:args: {}
f:env:
.: {}
k:{"name":"POD_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
k:{"name":"POD_NAMESPACE"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":80,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
k:{"containerPort":443,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:resources:
.: {}
f:limits: {}
f:requests: {}
f:securityContext:
.: {}
f:allowPrivilegeEscalation: {}
f:capabilities:
.: {}
f:add: {}
f:drop: {}
f:runAsUser: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2022-01-24T01:23:22Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:type: {}
- manager: Mozilla
operation: Update
apiVersion: apps/v1
time: '2022-01-28T23:18:41Z'
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:template:
f:spec:
f:containers:
k:{"name":"nginx-ingress-controller"}:
f:resources:
f:limits:
f:cpu: {}
f:memory: {}
f:requests:
f:cpu: {}
f:memory: {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2022-01-28T23:29:49Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:deployment.kubernetes.io/revision: {}
f:status:
f:conditions:
k:{"type":"Available"}:
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
k:{"type":"Progressing"}:
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:observedGeneration: {}
f:replicas: {}
f:unavailableReplicas: {}
f:updatedReplicas: {}
subresource: status
spec:
replicas: 2
selector:
matchLabels:
app: nginx-ingress
app.kubernetes.io/component: controller
release: nginx-ingress
template:
metadata:
creationTimestamp: null
labels:
app: nginx-ingress
app.kubernetes.io/component: controller
component: controller
release: nginx-ingress
spec:
containers:
- name: nginx-ingress-controller
image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
args:
- /nginx-ingress-controller
- '--default-backend-service=ingress/nginx-ingress-default-backend'
- '--election-id=ingress-controller-leader'
- '--ingress-class=nginx'
- '--configmap=ingress/nginx-ingress-controller'
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
resources:
limits:
cpu: 300m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 101
allowPrivilegeEscalation: true
restartPolicy: Always
terminationGracePeriodSeconds: 60
dnsPolicy: ClusterFirst
serviceAccountName: nginx-ingress
serviceAccount: nginx-ingress
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 16
replicas: 3
updatedReplicas: 2
unavailableReplicas: 3
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2022-01-28T22:58:07Z'
lastTransitionTime: '2022-01-28T22:58:07Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2022-01-28T23:29:49Z'
lastTransitionTime: '2022-01-28T23:29:49Z'
reason: ProgressDeadlineExceeded
message: >-
ReplicaSet "nginx-ingress-controller-59d9f94677" has timed out
progressing.

@Philip Welz的答案当然是正确的。由于Kubernetes v1.22中删除了v1beta1ingress API版本,因此有必要升级入口控制器。但这并不是我们面临的唯一问题,所以我决定制作一个";非常非常短";指导我们如何最终获得一个健康运行的集群(5天后),这样它就可以省去其他人的麻烦。

1.在YAML文件中升级nginx ingress控制器版本

在这里,我们简单地将yaml文件中的版本从:更改为

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v1.1.1

在这次操作之后,v1.1.1中的一个新pod诞生了。它开始得很好,运行得很健康。不幸的是,这并没有让我们的微服务重新上线。现在我知道这可能是因为必须对现有的ingress yaml文件进行一些更改,使其与新版本的ingress控制器兼容。所以直接进入步骤2。现在(下面两个标题)。

现在不要做这一步,只有当第二步失败时才做:重新安装nginxingress控制器

我们决定,在这种情况下,我们将按照微软的官方文档从头开始重新安装控制器:https://learn.microsoft.com/en-us/azure/aks/ingress-basic?tabs=azure-cli。请注意,这可能会更改入口控制器的外部IP地址。在我们的案例中,最简单的方法是删除整个ingress命名空间:

kubectl delete namespace ingress

不幸的是,这并没有删除入口类,因此需要额外的:

kubectl delete ingressclass nginx --all-namespaces

然后安装新控制器:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --create-namespace --namespace ingress 

如果您在步骤1中升级后重新安装了nginx入口控制器或IP地址发生了更改。:更新您的网络安全组、负载平衡器和域DNS

在您的AKS资源组中,应该是Network security group类型的资源。它包含入站和出站安全规则(我知道它可以作为防火墙工作)。应该有一个默认的网络安全组,由Kubernetes自动管理,IP地址应该在那里自动刷新。

不幸的是,我们还有一个额外的定制。我们不得不在那里手动更新规则。

在同一资源组中,应该存在Load balancer类型的资源。在Frontend IP configuration选项卡中,仔细检查IP地址是否反映了您的新IP地址。作为奖励,您可以在Backend pools选项卡中仔细检查那里的地址是否与您的内部节点IP匹配。

最后,不要忘记调整您的域名DNS记录。

2.升级ingress yaml配置文件以匹配语法更改

我们花了一段时间来确定一个工作模板,但实际上从上面提到的安装helloworld应用程序,微软的教程对我们帮助很大。我们从这个开始:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
name: hello-world-ingress
namespace: services
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/ssl-redirect: 'false'
nginx.ingress.kubernetes.io/use-regex: 'true'
rules:
- http:
paths:
- path: /hello-world-one(/|$)(.*)
pathType: Prefix
backend:
service:
name: aks-helloworld-one
port:
number: 80

在逐步引入更改之后,我们终于做到了以下内容。但我敢肯定,问题是我们错过了nginx.ingress.kubernetes.io/use-regex: 'true'条目:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
name: example-api
namespace: services
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "X-Forwarded-By: example-api";
nginx.ingress.kubernetes.io/rewrite-target: /example-api
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
tls:
- hosts:
- services.example.com
secretName: tls-secret
rules:
- host: services.example.com
http:
paths:
- path: /example-api
pathType: ImplementationSpecific
backend:
service:
name: example-api
port:
number: 80

万一有人想安装helloworld应用程序进行测试,那么yamls看起来如下:

apiVersion: apps/v1
kind: Deployment
metadata:
name: aks-helloworld-one  
spec:
replicas: 1
selector:
matchLabels:
app: aks-helloworld-one
template:
metadata:
labels:
app: aks-helloworld-one
spec:
containers:
- name: aks-helloworld-one
image: mcr.microsoft.com/azuredocs/aks-helloworld:v1
ports:
- containerPort: 80
env:
- name: TITLE
value: "Welcome to Azure Kubernetes Service (AKS)"
---
apiVersion: v1
kind: Service
metadata:
name: aks-helloworld-one  
spec:
type: ClusterIP
ports:
- port: 80
selector:
app: aks-helloworld-one

3.处理其他崩溃的应用程序

另一个在集群中崩溃的应用程序是cert-manager。这是在1.0.1版本中,所以,首先,我们将其升级到1.1.1版本:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --namespace cert-manager --version 1.1 cert-manager jetstack/cert-manager

这创造了一个全新的健康吊舱。我们很高兴并决定继续使用v1.1,因为我们有点害怕升级到更高版本时必须采取的额外措施(请查看本页底部https://cert-manager.io/docs/installation/upgrading/)。

集群现在终于修复了。是的,对吧?

4……但一定要检查兼容性图表

嗯。。现在我们知道certmanager与Kubernetes v1.22的兼容性仅从1.5版本开始。我们很不幸,就在那天晚上,我们的SSL证书从到期日起超过了30天的阈值,所以证书管理员决定续订证书!操作失败,证书管理器崩溃。Kubernetes回退到";Kubernetes伪造证书";。网页再次关闭,因为浏览器由于证书无效而扼杀了流量。修复是升级到1.5,并升级CRD:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.5.4/cert-manager.crds.yaml
helm upgrade --namespace cert-manager --version 1.5 cert-manager jetstack/cert-manager

之后,证书管理器的新实例成功刷新了我们的证书。群集再次保存。

如果您需要强制续订,您可以查看此问题:https://github.com/jetstack/cert-manager/issues/2641

@ajcann建议在证书中添加renewBefore属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: replace
path: /spec/renewBefore
value: 1440h
' --type=json

然后等待证书续订,然后删除属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: remove
path: /spec/renewBefore
' --type=json

Kubernetes 1.22仅支持NGINX Ingress Controller 1.0.0及更高版本=https://github.com/kubernetes/ingress-nginx#supported-版本表

您需要在Chart.yaml中将nginx-ingress-controllerBitnami Helm Chart升级到9.0.0版本。然后运行helm upgrade nginx-ingress-controller bitnami/nginx-ingress-controller

您还应该定期更新您的ingress控制器,特别是v0.34.1版本非常旧,因为ingress通常是从外部指定到集群的唯一入口。

相关内容

  • 没有找到相关文章

最新更新