我已经训练并构建了一个 Fastai(v1) 模型并将其导出为 .pkl 文件。 现在,我想在 Amazon Sagemaker 中部署此模型以进行推理
遵循 Sagemaker 文档 Pytorch 模型 [https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#write-an-inference-script][1]
采取
的步骤 文件夹结构
鼠尾草制造者/ export.pkl 法典/ inference.py 要求.txt
要求.txt 空间==2.3.4 火炬==1.4.0 火炬视觉==0.5.0 法泰==1.0.60 numpy
命令我用来创建 zip 文件
cd 鼠尾草/tar -czvf/tmp/model.tar.gz ./export.pkl ./code
这将生成一个模型.tar.gz文件,并将其上传到 S3 存储桶
为了部署它,我使用了python sagemaker SDK
from sagemaker.pytorch import PyTorchModel
role = "sagemaker-role-arn"
model_path = "s3 key for the model.tar.gz file that i created above"
pytorch_model = PyTorchModel(model_data=model_path,role=role,`entry_point='inference.py',framework_version="1.4.0", py_version="py3")
predictor = pytorch_model.deploy(instance_type='ml.c5.large', initial_instance_count=1)
执行上述代码后,我看到模型是在 sagemaker 中创建并部署的,但我最终在运行推理时遇到错误
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary with message "No module named 'fastai'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 110, in transform
self.validate_and_initialize(model_dir=model_dir)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 157, in validate_and_initialize
self._validate_user_module_and_set_functions()
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 170, in _validate_user_module_and_set_functions
user_module = importlib.import_module(user_module_name)
File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/ml/model/code/inference.py", line 2, in <module>
from fastai.basic_train import load_learner, DatasetType, Path
ModuleNotFoundError: No module named 'fastai'
显然,fastai模块没有被下载,这是什么原因,在这种情况下我做错了什么
要排查此类问题,您应该检查终端节点的 CloudWatch 日志。
应首先检查日志,以查看是否已找到并安装requirements.txt
,或者是否存在任何依赖项错误。
为了打包模型和推理脚本,建议有两个文件:
model.tar.gz
具有模型和模型文件。sourcedir.tar.gz
并使用 SageMaker 环境变量SAGEMAKER_SUBMIT_DIRECTORY
指向 S3s3://bucket/prefix/sourcedir.tar.gz
上的文件位置。您可以使用SAGEMAKER_PROGRAM
指向文件名inference.py
。
注意:当你在 PyTorchModel 中使用source_dir
时,SDK 会打包source_dir
,上传到 s3 并定义SAGEMAKER_SUBMIT_DIRECTORY
。