自定义容器在顶点ai中的部署



我试图在顶点ai端点部署我的自定义容器进行预测。申请内容如下:

  1. Flask - app.py
import pandas as pd
from flask import Flask, jsonify,request
import tensorflow
import pre_process
import post_process

app = Flask(__name__)

@app.route('/predict',methods=['POST'])
def predict():
req = request.json.get('instances')

input_data = req[0]['email']
#preprocessing
text = pre_process.preprocess(input_data)
vector = pre_process.preprocess_tokenizing(text)
model = tensorflow.keras.models.load_model('model')
#predict
prediction = model.predict(vector)
#postprocessing
value = post_process.postprocess(list(prediction[0])) 

return jsonify({'output':{'doc_class':value}})

if __name__=='__main__':
app.run(host='0.0.0.0')
  • Dockerfile
  • FROM python:3.7
    WORKDIR /app
    COPY . /app
    RUN pip install --trusted-host pypi.python.org -r requirements.txt 
    
    CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
    EXPOSE 5050
    
  • pre_process.py
  • #import 
    import pandas as pd
    import pickle
    import re
    import tensorflow as tf
    from tensorflow.keras.preprocessing.sequence import pad_sequences
    
    def preprocess(text):
    """Do all the Preprocessing as shown above and
    return a tuple contain preprocess_email,preprocess_subject,preprocess_text for that Text_data"""
    
    
    #After you store it in the list, Replace those sentances in original text by space.
    text = re.sub("(Subject:).+"," ",text,re.I)
    
    #Delete all the sentances where sentence starts with "Write to:" or "From:".
    text = re.sub("((Write to:)|(From:)).+","",text,re.I)
    
    #Delete all the tags like "< anyword >"
    text = re.sub("<[^><]+>","",text)
    
    #Delete all the data which are present in the brackets.
    text = re.sub("([^()]+)","",text)
    
    #Remove all the newlines('n'), tabs('t'), "-", "".
    text = re.sub("[nt\-]+","",text)
    
    #Remove all the words which ends with ":".
    text = re.sub("(w+:)","",text)
    
    #Decontractions, replace words like below to full words.
    lines = re.sub(r"n't", " not", text)
    lines = re.sub(r"'re", " are", lines)
    lines = re.sub(r"'s", " is", lines)
    lines = re.sub(r"'d", " would", lines)
    lines = re.sub(r"'ll", " will", lines)
    lines = re.sub(r"'t", " not", lines)
    lines = re.sub(r"'ve", " have", lines)
    lines = re.sub(r"'m", " am", lines)
    text = lines
    
    #replace numbers with spaces
    text = re.sub("d+"," ",text)
    
    # remove _ from the words starting and/or ending with _
    text = re.sub("(s_)|(_s)"," ",text)
    
    #remove 1 or 2 letter word before _
    text = re.sub("w{1,2}_","",text)
    
    #convert all letters to lowercase and remove the words which are greater 
    #than or equal to 15 or less than or equal to 2.
    text = text.lower()
    
    text =" ".join([i for i in text.split() if len(i)<15 and len(i)>2])
    
    #replace all letters except A-Z,a-z,_ with space
    preprocessed_text = re.sub("W+"," ",text)
    return preprocessed_text
    def preprocess_tokenizing(text):
    
    #from tf.keras.preprocessing.text import Tokenizer
    #from tf.keras.preprocessing.sequence import pad_sequences
    
    tokenizer = pickle.load(open('tokenizer.pkl','rb'))
    max_length = 1019
    tokenizer.fit_on_texts([text])
    encoded_docs = tokenizer.texts_to_sequences([text])
    text_padded = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
    
    return text_padded
    
    4
  • post_process.py
  • def postprocess(vector):
    index = vector.index(max(vector))
    classes = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
    return classes[index]
    
    4
  • gunicorn
    pandas==1.3.3
    numpy==1.19.5
    flask
    flask-cors
    h5py==3.1.0
    scikit-learn==0.24.2
    tensorflow==2.6.0
    
  • 模型

  • tokenizer.pkl

  • 我遵循这个博客顶点ai部署的gcloud控制台命令来容器化和部署模型到端点。但是这个模型需要很长时间才能部署,并且最终无法部署。

    在本地主机上运行容器后,它按预期运行,但它没有部署到顶点ai端点。我不明白问题是否在flask app.py或Dockerfile中,或者问题是否存在于其他地方。

    我能够通过向http服务器添加健康路由来解决此问题。我在我的flask应用中添加了以下代码:

    @app.route('/healthz')
    def healthz():
    return "OK"
    

    最新更新