GCP AI平台:创建自定义预测器模型版本(训练模型Pytorch model + torchvision.transf

我目前正在尝试通过以下方式将自定义模型部署到AI平台https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud_1。它基于来自'Pytorch'的预训练模型的组合和">torchvision.transform">。目前，我一直低于错误，这恰好与自定义预测的500MB限制有关。

ERROR: (gcloud.beta.ai-platform.versions.create)创建版本失败。检测到错误的模型:模型需要比允许的更多的内存。请尝试减小模型大小并重新部署。如果您继续遇到错误，请联系支持。

py>
from setuptools import setup from pathlib import Path base = Path(__file__).parent REQUIRED_PACKAGES = [line.strip() for line in open(base/"requirements.txt")] print(f"nPackages: {REQUIRED_PACKAGES}nn") # [torch==1.3.0,torchvision==0.4.1, ImageHash==4.2.0 # Pillow==6.2.1,pyvis==0.1.8.2] installs 800mb worth of files setup(description="Extract features of a image", author=, name='test', version='0.1', install_requires=REQUIRED_PACKAGES, project_urls={ 'Documentation':'https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines#tensorflow', 'Deploy':'https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud_1', 'Ai_platform troubleshooting':'https://cloud.google.com/ai-platform/training/docs/troubleshooting', 'Say Thanks!': 'https://medium.com/searce/deploy-your-own-custom-model-on-gcps-ai-platform- 7e42a5721b43', 'google Torch wheels':"http://storage.googleapis.com/cloud-ai-pytorch/readme.txt", 'Torch & torchvision wheels':"https://download.pytorch.org/whl/torch_stable.html " }, python_requires='~=3.7', scripts=['predictor.py', 'preproc.py'])
步骤:尝试将' torch '和torchvision直接添加到setup.py文件中的' REQUIRED_PACKAGES '列表中，以便在部署时将PyTorch + torchvision作为依赖项安装。我猜，内部Ai平台下载PyTorch的PyPI包是+500 MB，这导致我们的模型部署失败。如果我只是用'火炬'部署模型，它似乎正在工作(当然抛出错误，无法找到库'火炬视觉')
文件大小
pytorch(torch-1.3.1 + cpu-cp37-cp37m-linux_x86_64.whl关于111 mb)
torchvision(torchvision-0.4.1 + cpu-cp37-cp37m-linux_x86_64.whl关于46 mfromhttps://download.pytorch.org/whl/torch_stable.html并存储在GKS上。
预测器模型压缩文件(.tar.gz格式)，它是setup.py (5kb)的输出。)
一个经过训练的PyTorch模型(size44MB))
总的来说，模型依赖项应该小于250MB，但是仍然会出现这个错误。我也尝试过使用Google镜像包http://storage.googleapis.com/cloud-ai-pytorch/readme.txt提供的torch和torchvision，但同样的内存问题仍然存在。人工智能平台对我们来说是相当新的，希望专业人士提供一些意见。
更多信息:
GCP CLI输入:
环境变量:
BUCKET_NAME= “something” MODEL_DIR=<> VERSION_NAME='v6' MODEL_NAME="something_model" STAGING_BUCKET=$MODEL_DIR<> # TORCH_PACKAGE=$MODEL_DIR"package/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl" # TORCHVISION_PACKAGE=$MODEL_DIR"package/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl" TORCH_PACKAGE=<> TORCHVISION_PACKAGE=<> CUSTOM_CODE_PATH=$STAGING_BUCKET"imt_ai_predict-0.1.tar.gz" PREDICTOR_CLASS="predictor.MyPredictor" REGION=<> MACHINE_TYPE='mls1-c4-m2' gcloud beta ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --runtime-version=2.3 --python-version=3.7 --machine-type=$MACHINE_TYPE --package-uris=$CUSTOM_CODE_PATH,$TORCH_PACKAGE,$TORCHVISION_PACKAGE --prediction-class=$PREDICTOR_CLASS
GCP CLI输出:
**[1] global** [2] asia-east1 [3] asia-northeast1 [4] asia-southeast1 [5] australia-southeast1 [6] europe-west1 [7] europe-west2 [8] europe-west3 [9] europe-west4 [10] northamerica-northeast1 [11] us-central1 [12] us-east1 [13] us-east4 [14] us-west1 [15] cancel Please enter your numeric choice: 1 To make this the default region, run `gcloud config set ai_platform/region global`. Using endpoint [https://ml.googleapis.com/] Creating version (this might take a few minutes)......failed. ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: **Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.**
我发现:有没有发现人们以同样的方式为PyTorch软件包奋斗的文章，并通过在GCS上安装火炬轮(https://medium.com/searce/deploy-your-own-custom-model-on-gcps-ai-platform-7 e42a5721b43)。我也尝试过火炬和火炬视觉同样的方法，但直到现在还没有运气，等待着"cloudml-feedback@google.com cloudml-feedback@google.com&quot的回应。任何关于在AI平台上获得基于自定义torch_torchvision的自定义预测器的帮助，将是伟大的。

解决了这个问题。我坚持使用4gb CPU的MlS1机器和自定义预测例程(<500MB)。

使用setup.py参数安装库，但不是解析包名和版本，而是添加正确的火炬轮(理想情况下为100 mb)。

REQUIRED_PACKAGES = [line.strip() for line in open(base/"requirements.txt")] +
['torchvision==0.5.0', 'torch @ https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl']

我减少了预处理的步骤。由于无法容纳所有的响应，所以请从preproc.py和predict .py中分别解析SEND响应和GET响应。

import json
json.dump(your data to send to predictor class)

从需要的库类中导入这些函数。

from torch import zeros,load 
your code

(重要)

还没有为训练模型测试不同类型的序列化对象，也可能是不同的。Save, pickle, joblib等)是内存节省。
为那些组织是GCP的合作伙伴可能能够请求更多配额的组织找到了这个链接(我猜从500MB到2GB左右)。当我的问题解决了，其他问题又冒出来了，我就不必再往这个方向走了。https://cloud.google.com/ai-platform/training/docs/quotas

更多信息:

相关内容

最新更新

热门标签：