"gcloud builds submit ..."后,Flask 应用部署失败(活动探测失败)



我是前端/后端/DevOps的新手。但我需要使用 Kubernetes 在 Google Cloud Platform (GCP) 上部署一个应用程序来提供服务。然后,我按照以下系列教程开始学习:

  • https://mickeyabhi1999.medium.com/build-and-deploy-a-web-app-with-react-flask-nginx-postgresql-docker-and-google-kubernetes-e586de159a4d
  • https://medium.com/swlh/build-and-deploy-a-web-app-with-react-flask-nginx-postgresql-docker-and-google-kubernetes-341f3b4de322

本教程系列的代码在这里: https://github.com/abhiChakra/Addition-App

一切都很好,直到最后一步:使用"gcloud 构建提交......"来构建

  1. nginx+react service
  2. 烧瓶+WSGI服务
  3. nginx+react 部署
  4. 烧瓶+WSGI部署 在 GCP 集群上。

1.~3. 进展顺利,状态为"正常"。但是,即使多次重新启动,flask+wsgi 部署的状态也是"没有最低可用性"。

我使用了"kubectl get pods",看到烧瓶pod的状态是"CrashLoopBackOff"。 然后我按照此处建议的调试过程进行操作: https://containersolutions.github.io/runbooks/posts/kubernetes/crashloopbackoff/

我使用"kubectl 描述豆荚烧瓶"来研究烧瓶豆荚的问题。然后我发现"退出代码"是 139,并且有消息"活动探测失败:获取"http://10.24.0.25:8000/health":读取 tcp 10.24.0.1:55470->10.24.0.25:8000:读取:对等方重置连接"和"就绪探测失败:获取"http://10.24.0.25:8000/ready":读取 tcp 10.24.0.1:55848->10.24.0.25:8000:读取:对等方重置连接"。

完整日志:

Name:         flask-676d5dd999-cf6kt
Namespace:    default
Priority:     0
Node:         gke-addition-app-default-pool-89aab4fe-3l1q/10.140.0.3
Start Time:   Thu, 11 Nov 2021 19:06:24 +0800
Labels:       app.kubernetes.io/managed-by=gcp-cloud-build-deploy
component=flask
pod-template-hash=676d5dd999
Annotations:  <none>
Status:       Running
IP:           10.24.0.25
IPs:
IP:           10.24.0.25
Controlled By:  ReplicaSet/flask-676d5dd999
Containers:
flask:
Container ID:   containerd://5459b747e1d44046d283a46ec1eebb625be4df712340ff9cf492d5583a4d41d2
Image:          gcr.io/peerless-garage-330917/addition-app-flask:latest
Image ID:       gcr.io/peerless-garage-330917/addition-app-flask@sha256:b45d25ffa8a0939825e31dec1a6dfe84f05aaf4a2e9e43d35084783edc76f0de
Port:           8000/TCP
Host Port:      0/TCP
State:          Running
Started:      Fri, 12 Nov 2021 17:24:14 +0800
Last State:     Terminated
Reason:       Error
Exit Code:    139
Started:      Fri, 12 Nov 2021 17:17:06 +0800
Finished:     Fri, 12 Nov 2021 17:19:06 +0800
Ready:          False
Restart Count:  222
Limits:
cpu:  1
Requests:
cpu:        400m
Liveness:     http-get http://:8000/health delay=120s timeout=1s period=5s #success=1 #failure=3
Readiness:    http-get http://:8000/ready delay=120s timeout=1s period=5s #success=1 #failure=3
Environment:  <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s97x5 (ro)
Conditions:
Type              Status
Initialized       True 
Ready             False 
ContainersReady   False 
PodScheduled      True 
Volumes:
default-token-s97x5:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-s97x5
Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type     Reason     Age                     From     Message
----     ------     ----                    ----     -------
Warning  Unhealthy  9m7s (x217 over 21h)    kubelet  (combined from similar events): Liveness probe failed: Get "http://10.24.0.25:8000/health": read tcp 10.24.0.1:48636->10.24.0.25:8000: read: connection reset by peer
Warning  BackOff    4m38s (x4404 over 22h)  kubelet  Back-off restarting failed container

按照这里的建议: https://containersolutions.github.io/runbooks/posts/kubernetes/crashloopbackoff/#step-4 我将"初始延迟秒"增加到 120,但它仍然失败了。

因为我确保我的本地笔记本电脑上一切正常,所以我认为可能存在一些连接或身份验证问题。

更详细地说,deployment.yaml 如下所示:

apiVersion: v1
kind: Service
metadata:
name: ui
spec:
type: LoadBalancer
selector:
app: react
tier: ui
ports:
- port: 8080
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata: 
name: flask
spec:
type: ClusterIP
selector:
component: flask
ports:
- port: 8000
targetPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask
spec:
replicas: 1
selector:
matchLabels:
component: flask
template:
metadata:
labels:
component: flask
spec:
containers:
- name: flask
image: gcr.io/peerless-garage-330917/addition-app-flask:latest
imagePullPolicy: "Always"
resources:
limits:
cpu: "1000m"
requests:
cpu: "400m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 30
periodSeconds: 5
ports:
- containerPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
spec:
replicas: 1
selector:
matchLabels:
app: react
tier: ui
template:
metadata:
labels:
app: react
tier: ui
spec:
containers:
- name: ui
image: gcr.io/peerless-garage-330917/addition-app-nginx:latest
imagePullPolicy: "Always"
resources:
limits:
cpu: "1000m"
requests:
cpu: "400m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
ports:
- containerPort: 8080

docker-compose.yaml:

# we will be creating these services
services:
flask:
# Note that we are building from our current terminal directory where our Dockerfile is located, we use .
build: . 
# naming our resulting container
container_name: flask
# publishing a port so that external services requesting port 8000 on your local machine
# are mapped to port 8000 on our container
ports:
- "8000:8000"
nginx: 
# Since our Dockerfile for web-server is located in react-app foler, our build context is ./react-app
build: ./react-app
container_name: nginx
ports:
- "8080:8080"

Nginx Dockerfile:

# first building react project, using node base image
FROM node:10 as build-stage
# setting working dir inside container
WORKDIR /react-app
# required to install packages
COPY package*.json ./
# installing npm packages
RUN npm install
# copying over react source material
COPY src ./src
# copying over further react material
COPY public ./public
# copying over our nginx config file
COPY addition_container_server.conf ./
# creating production build to serve through nginx
RUN npm run build
# starting second, nginx build-stage
FROM nginx:1.15
# removing default nginx config file
RUN rm /etc/nginx/conf.d/default.conf
# copying our nginx config
COPY --from=build-stage /react-app/addition_container_server.conf /etc/nginx/conf.d/
# copying production build from last stage to serve through nginx
COPY --from=build-stage /react-app/build/ /usr/share/nginx/html
# exposing port 8080 on container
EXPOSE 8080
CMD ["nginx", "-g", "daemon off;"]

Nginx服务器配置:

server {
listen 8080;
# location of react build files
root /usr/share/nginx/html/;
# index html from react build to serve
index index.html;
# ONLY KUBERNETES RELEVANT: endpoint for health checkup
location /health {
return 200 "health ok";
}
# ONLY KUBERNETES RELEVANT: endpoint for readiness checkup
location /ready {
return 200 "ready";
}
# html file to serve with / endpoint
location / {
try_files $uri /index.html;
}

# proxing under /api endpoint
location /api {
client_max_body_size 10m;
add_header 'Access-Control-Allow-Origin' http://<NGINX_SERVICE_ENDPOINT>:8080;
proxy_pass http://flask:8000/;
}
}

App.js 中有两个重要功能:

...
insertCalculation(event, calculation){
/*
Making a POST request via a fetch call to Flask API with numbers of a
calculation we want to insert into DB. Making fetch call to web server
IP with /api/insert_nums which will be reverse proxied via Nginx to the
Application (Flask) server.
*/
event.preventDefault();
fetch('http://<NGINX_SERVICE_ENDPOINT>:8080/api/insert_nums', {method: 'POST',
mode: 'cors',
headers: {
'Content-Type' : 'application/json'
},
body: JSON.stringify(calculation)}
).then((response) => {
...
getHistory(event){
/*
Making a GET request via a fetch call to Flask API to retrieve calculations history.
*/
event.preventDefault()
fetch('http://<NGINX_SERVICE_ENDPOINT>:8080/api/data', {method: 'GET',
mode: 'cors'
}
).then(response => {
...

烧瓶容器文件:

# using base image
FROM python:3.8
# setting working dir inside container
WORKDIR /addition_app_flask
# adding run.py to workdir
ADD run.py .
# adding config.ini to workdir
ADD config.ini .
# adding requirements.txt to workdir
ADD requirements.txt .
# installing flask requirements
RUN pip install -r requirements.txt
# adding in all contents from flask_app folder into a new flask_app folder
ADD ./flask_app ./flask_app
# exposing port 8000 on container
EXPOSE 8000
# serving flask backend through uWSGI server
CMD [ "python", "run.py" ]

run.py:

from gevent.pywsgi import WSGIServer
from flask_app.app import app
# As flask is not a production suitable server, we use will
# a WSGIServer instance to serve our flask application. 
if __name__ == '__main__':  
WSGIServer(('0.0.0.0', 8000), app).serve_forever()

app.py:

from flask import Flask, request, jsonify
from flask_app.storage import insert_calculation, get_calculations
app = Flask(__name__)
@app.route('/')
def index():
return "My Addition App", 200
@app.route('/health')
def health():
return '', 200
@app.route('/ready')
def ready():
return '', 200
@app.route('/data', methods=['GET'])
def data():
'''
Function used to get calculations history
from Postgres database and return to fetch call in frontend.
:return: Json format of either collected calculations or error message
'''
calculations_history = []
try:
calculations = get_calculations()
for key, value in calculations.items():
calculations_history.append(value)

return jsonify({'calculations': calculations_history}), 200
except:
return jsonify({'error': 'error fetching calculations history'}), 500
@app.route('/insert_nums', methods=['POST'])
def insert_nums():
'''
Function used to insert a calculation into our postgres
DB. Operands of operation received from frontend.
:return: Json format of either success or failure response.
'''
insert_nums = request.get_json()
firstNum, secondNum, answer = insert_nums['firstNum'], insert_nums['secondNum'], insert_nums['answer']
try:
insert_calculation(firstNum, secondNum, answer)
return jsonify({'Response': 'Successfully inserted into DB'}), 200
except:
return jsonify({'Response': 'Unable to insert into DB'}), 500

我说不出出了什么问题。我也想知道调试这样的云部署案例应该有什么更好的方法?因为在普通程序中,我们可以设置一些断点并打印或记录一些东西来检查导致问题的代码的根位置,但是在云部署中,我失去了调试的方向。

...Exit Code was 139...

这可能意味着您的 Flask 应用程序中存在错误。您可以从最低规格开始,而不是试图在一个目标中完成所有工作:

apiVersion: v1
kind: Pod
metadata:
name: flask
labels:
component: flask
spec:
containers:
- name: flask
image: gcr.io/peerless-garage-330917/addition-app-flask:latest
ports:
- containerPort: 8000

查看您的容器是否相应地启动。如果是,请尝试连接到它kubectl port-forward <flask pod name> 8000:8000,然后是curl localhost:8000/health。您应该随时关注您的申请kubectl logs -f <flask pod name>

感谢您@gohm的回复!最好隔离不同的部件并从较小的组件开始。按照建议,我首先尝试部署单个烧瓶吊舱。然后我用了

kubectl port-forward flask 8000:8000

将端口映射到本地计算机。使用后curl localhost:8000/health要访问端口,它显示

Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000
Handling connection for 8000
E1112 18:52:15.874759  300145 portforward.go:400] an error occurred forwarding 8000 -> 8000: error forwarding port 8000 to pod 4870b939f3224f968fd5afa4660a5af7d10e144ee85149d69acff46a772e94b1, uid : failed to execute portforward in network namespace "/var/run/netns/cni-32f718f0-1248-6da4-c726-b2a5bf1918db": read tcp4 127.0.0.1:38662->127.0.0.1:8000: read: connection reset by peer

此时此刻,使用

kubectl logs -f flask

返回空响应。 所以烧瓶应用程序中确实存在一些问题。

此运行状况探测在 app.py 中是一个非常简单的功能:

@app.route('/health')
def health():
return '', 200

我如何知道路线设置是否错误? 是因为 run.py 的WSGIServer吗?

from gevent.pywsgi import WSGIServer
from flask_app.app import app
# As flask is not a production suitable server, we use will
# a WSGIServer instance to serve our flask application. 
if __name__ == '__main__':  
WSGIServer(('0.0.0.0', 8000), app).serve_forever()

如果我们看一下 Dockerfile,它似乎公开了正确的端口 8000。 如果我直接运行

python run.py

在我的笔记本电脑上,我可以成功访问本地主机:8000。 如何调试此类问题?

相关内容

最新更新