Azure AKS - pod 保持 CrashLoopBackOff 状态



我正在尝试将应用程序从我的个人 docker 注册表部署到 Azure AKS pod 中。 我有只记录一些输出的python应用程序:

import time
import logging
logger = logging.getLogger('main')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
def main():
logger.info('This is test')
time.sleep(5)

while True:
try:
main()
except Exception:
logger.critical('Something critical.', exc_info=1)
logger.info('Sleep for 5 seconds')
time.sleep(5)

这是我的 Dockerfile:

FROM python:3.7-alpine
RUN apk update && apk upgrade
ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}
COPY requirements.txt .
RUN 
apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev linux-headers && 
python3 -m pip install -r requirements.txt --no-cache-dir && 
apk --purge del .build-deps
COPY app .
ENTRYPOINT [ "python", "-u", "run.py" ]

我可以在本地计算机上运行容器,这里有一些日志:

docker logs -tf my-container
2020-02-07T10:26:57.939062754Z 2020-02-07 10:26:57,938 - main - INFO - This is test
2020-02-07T10:27:02.944500969Z 2020-02-07 10:27:02,943 - main - INFO - Sleep for 5 seconds
2020-02-07T10:27:07.948643749Z 2020-02-07 10:27:07,948 - main - INFO - This is test
2020-02-07T10:27:12.953683767Z 2020-02-07 10:27:12,953 - main - INFO - Sleep for 5 seconds
2020-02-07T10:27:17.955954057Z 2020-02-07 10:27:17,955 - main - INFO - This is test
2020-02-07T10:27:22.960453835Z 2020-02-07 10:27:22,959 - main - INFO - Sleep for 5 seconds
2020-02-07T10:27:27.964402790Z 2020-02-07 10:27:27,963 - main - INFO - This is test
2020-02-07T10:27:32.968647112Z 2020-02-07 10:27:32,967 - main - INFO - Sleep for 5 seconds

我正在尝试使用此 yaml 文件部署 pod,kubectl apply -f onepod.yaml

apiVersion: v1
kind: Pod
metadata:
name: my-container
labels:
platform: xxx
event: yyy
protocol: zzz
spec:
imagePullSecrets:
- name: myregistry
containers:
- name: my-container
image: mypersonalregistry/my-container:test

Pod 已创建,但通过kubectl logs命令保持CrashLoopBackOff没有任何输出日志的状态。我试过kubectl describe pod但在事件中没有任何用处:

Name:         my-container
Namespace:    default
Priority:     0
Node:         aks-agentpool-56095163-vmss000000/10.240.0.4
Start Time:   Fri, 07 Feb 2020 11:41:48 +0100
Labels:       event=yyy
platform=xxx
protocol=zzz
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"event":"yyy","platform":"xxx","protocol":"zzz"},"name":"my-container...
Status:       Running
IP:           10.244.1.33
IPs:          <none>
Containers:
my-container:
Container ID:   docker://c497674f86deadca2ef874f8a94361e26c770314e9cff1729bf20b5943d1a700
Image:          mypersonalregistry/my-container:test
Image ID:       docker-pullable://mypersonalregistry/my-container@sha256:c4208f42fea9a99dcb3b5ad8b53bac5e39bc54b8d89a577f85fec1a94535bc39
Port:           <none>
Host Port:      <none>
State:          Waiting
Reason:       CrashLoopBackOff
Last State:     Terminated
Reason:       Completed
Exit Code:    0
Started:      Fri, 07 Feb 2020 12:28:10 +0100
Finished:     Fri, 07 Feb 2020 12:28:10 +0100
Ready:          False
Restart Count:  14
Environment:    <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lv75n (ro)
Conditions:
Type              Status
Initialized       True 
Ready             False 
ContainersReady   False 
PodScheduled      True 
Volumes:
default-token-lv75n:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-lv75n
Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type     Reason     Age                    From                                        Message
----     ------     ----                   ----                                        -------
Normal   Scheduled  49m                    default-scheduler                           Successfully assigned default/my-container to aks-agentpool-56095163-vmss000000
Normal   Pulled     48m (x5 over 49m)      kubelet, aks-agentpool-56095163-vmss000000  Container image "mypersonalregistry/my-container:test" already present on machine
Normal   Created    48m (x5 over 49m)      kubelet, aks-agentpool-56095163-vmss000000  Created container my-container
Normal   Started    48m (x5 over 49m)      kubelet, aks-agentpool-56095163-vmss000000  Started container my-container
Warning  BackOff    4m55s (x210 over 49m)  kubelet, aks-agentpool-56095163-vmss000000  Back-off restarting failed container

我怎么知道,为什么它在我的计算机上工作,但在 kubernetes 集群中不起作用?

所以问题在于拉取最新版本的图像。更多在这里:

默认的拉取策略是 IfNotPresent,它会导致 Kubelet 跳过拉取图像(如果它已经存在(。

因此,它仍然运行带有标签test的第一个版本的my-container,即使它在我的注册表中也永远不会下载新版本。

解决方案是将此行添加到yaml文件中:

imagePullPolicy: Always

你看到的是100%的预期。应用程序休眠 10 秒钟并退出。Kubernetes 希望 Pod 无限期运行。如果 pod 因任何原因退出(即使退出代码为 0( - Kubernetes 将尝试重新启动它。如果 pod 多次退出 - Kubernetes 假设你的 pod 工作不正常,并将其状态更改为 CrashloopingBackoff。

你可以尝试将代码更改为在无限循环中运行,你会看到 Kubernetes 会对此感到满意。

如果你想运行任务来完成 - 你可能想要使用 Kubernetes Jobs。Kubernetes 预计乔布斯将以退出代码 0 完成。

相关内容

  • 没有找到相关文章

最新更新