我正试图通过ingress nginx和谷歌云负载均衡器在GKE集群中公开一个mlflow模型。
各个部署的服务配置如下:
apiVersion: v1
kind: Service
metadata:
name: model-inference-service
labels:
app: inference
spec:
ports:
- port: 5555
targetPort: 5555
selector:
app: inference
当使用kubectl port-forward service/model-inference-service 5555:5555
将此服务转发到localhost时,我可以通过使用以下脚本向api端点发送测试映像来成功查询模型。
请求发送到的url是http://127.0.0.1:5555/invocations
。这可以按预期工作,因此我假设运行公开模型和相应clusterIP服务model-inference-service
的pod的部署配置正确
接下来,我通过将ingress nxnx安装到集群中
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-release ingress-nginx/ingress-nginx
入口配置如下(我怀疑错误一定在这里?(:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
# nginx.ingress.kubernetes.io/rewrite-target: /invocations
name: inference-ingress
namespace: default
labels:
app: inference
spec:
rules:
- http:
paths:
- path: /invocations
backend:
serviceName: model-inference-service
servicePort: 5555
入口控制器吊舱成功运行:
my-release-ingress-nginx-controller-6758cc8f45-fwtw7 1/1 Running 0 3h33m
在GCP控制台中,我可以看到负载均衡器也成功创建,并且我可以选择它的IP。
当使用我以前使用的相同测试脚本向Rest-api端点发出请求时(以前服务被转发到localhost(,但现在使用负载均衡器的ip,我得到了一个502坏网关错误:
url现在如下:http://34.90.4.0:80/invocations
Traceback (most recent call last):
File "test_inference.py", line 80, in <module>
run()
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "test_inference.py", line 76, in run
print(score_model(data_path, host, port).text)
File "test_inference.py", line 54, in score_model
status_code=response.status_code, text=response.text
Exception: Status Code 502. <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>
当在浏览器中访问相同的url时,它会显示:
502 Bad Gateway
nginx/1.19.1
入口控制器状态日志:
2020/08/26 16:06:45 [warn] 86#86: *42282 a client request body is buffered to a temporary file /tmp/client-body/0000000009, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
10.10.0.30 - - [26/Aug/2020:16:06:45 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "python-requests/2.24.0" 86151 0.738 [default-model-inference-service-5555] [] 10.52.3.7:5555, 10.52.3.7:5555, 10.52.3.7:5555 0, 0, 0 0.000, 0.001, 0.000 502, 502, 502 0d86e360427c0a81c287da4ff5e907bc
为了测试入口和负载均衡器是否原则上工作,我用我想用这个docker映像公开的真正的rest api替换了docker映像;你好世界";在端口5050和路径CCD_ 5上。我改变了上面所示的服务和入口清单中的端口和路径(从/invocations
到/
(;你好世界";当在浏览器中访问负载均衡器的ip时。
有人看到我可能做错了什么吗?非常感谢!
致问候,
F
您共享的配置看起来不错。集群环境中一定有某种原因导致了这种行为。查看吊舱间通信是否正常。在与Nginx入口控制器相同的节点上启动一个测试pod,并从该pod到目标服务执行curl
。看看你是否遇到任何DNS或网络问题。尝试在调用服务时更改主机标头,看看它是否对此敏感。