尝试在 sagemaker 上部署 fastai 模型时没有名为 "Fastai" 的模块



我已经训练并构建了一个 Fastai(v1) 模型并将其导出为 .pkl 文件。 现在,我想在 Amazon Sagemaker 中部署此模型以进行推理

遵循 Sagemaker 文档 Pytorch 模型 [https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#write-an-inference-script][1]

采取
的步骤 文件夹结构

鼠尾草制造者/ export.pkl  法典/ inference.py  要求.txt
要求.txt 空间==2.3.4 火炬==1.4.0 火炬视觉==0.5.0 法泰==1.0.60 numpy

命令我用来创建 zip 文件

cd 鼠尾草/tar -czvf/tmp/model.tar.gz ./export.pkl ./code

这将生成一个模型.tar.gz文件,并将其上传到 S3 存储桶

为了部署它,我使用了python sagemaker SDK


from sagemaker.pytorch import PyTorchModel
role = "sagemaker-role-arn"
model_path = "s3 key for the model.tar.gz file that i created above"
pytorch_model = PyTorchModel(model_data=model_path,role=role,`entry_point='inference.py',framework_version="1.4.0", py_version="py3")

predictor = pytorch_model.deploy(instance_type='ml.c5.large', initial_instance_count=1)

执行上述代码后,我看到模型是在 sagemaker 中创建并部署的,但我最终在运行推理时遇到错误


botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary with message "No module named 'fastai'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 110, in transform
self.validate_and_initialize(model_dir=model_dir)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 157, in validate_and_initialize
self._validate_user_module_and_set_functions()
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 170, in _validate_user_module_and_set_functions
user_module = importlib.import_module(user_module_name)
File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/ml/model/code/inference.py", line 2, in <module>
from fastai.basic_train import load_learner, DatasetType, Path
ModuleNotFoundError: No module named 'fastai'

显然,fastai模块没有被下载,这是什么原因,在这种情况下我做错了什么

要排查此类问题,您应该检查终端节点的 CloudWatch 日志。

应首先检查日志,以查看是否已找到并安装requirements.txt,或者是否存在任何依赖项错误。

为了打包模型和推理脚本,建议有两个文件:

  1. model.tar.gz具有模型和模型文件。
  2. sourcedir.tar.gz并使用 SageMaker 环境变量SAGEMAKER_SUBMIT_DIRECTORY指向 S3s3://bucket/prefix/sourcedir.tar.gz上的文件位置。您可以使用SAGEMAKER_PROGRAM指向文件名inference.py

注意:当你在 PyTorchModel 中使用source_dir时,SDK 会打包source_dir,上传到 s3 并定义SAGEMAKER_SUBMIT_DIRECTORY

最新更新