我正在尝试创建一个脚本,该脚本从云存储桶加载保存的CatBoost模型,并使用它进行预测。但是,我无法成功加载该文件。CatBoost抛出一个错误,模型文件不存在,尽管我已经直接从UI复制了路径。
我正在使用谷歌云平台。该脚本位于AI Platform JupyterLab笔记本中,与存储模型的桶位于同一项目中。我用来进行预测的特征集存储在与模型相同的桶中,并且我能够成功地将特征集文件读取到数据框(X_eval)中。
我尝试使用URI ("gs://…")和经过身份验证的URL ("https://..."),并且都抛出相同的错误。
#Specify model path
path = 'gs://bucket_id/model-name'
# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)
model.predict(X_eval)
---------------------------------------------------------------------------
CatBoostError Traceback (most recent call last)
<ipython-input-9-f7a6068f5718> in <module>
70
71 if __name__ == "__main__":
---> 72 main('data','context')
<ipython-input-9-f7a6068f5718> in main(data, context)
42 # Load model
43 from_file = CatBoostClassifier()
---> 44 from_file.load_model(path)
45
46 model.predict(X_eval)
/opt/conda/lib/python3.7/site-packages/catboost/core.py in load_model(self, fname, format, stream, blob)
2655
2656 if fname is not None:
-> 2657 self._load_model(fname, format)
2658 elif stream is not None:
2659 self._load_from_stream(stream)
/opt/conda/lib/python3.7/site-packages/catboost/core.py in _load_model(self, model_file, format)
1345 raise CatBoostError("Invalid fname type={}: must be str().".format(type(model_file)))
1346
-> 1347 self._object._load_model(model_file, format)
1348 self._set_trained_model_attributes()
1349 for key, value in iteritems(self._get_params()):
_catboost.pyx in _catboost._CatBoost._load_model()
_catboost.pyx in _catboost._CatBoost._load_model()
CatBoostError: catboost/libs/model/model_import_interface.h:19: Model file doesn't exist: gs://bucket_id/model-name
如果我将相同的模型文件上传到本地文件系统(例如,运行JupyterLabs笔记本的VM的文件系统),模型将成功加载。例如,可以这样做:
#Specify model path
path = 'model-name'
# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)
model.predict(X_eval)
有一种更好的方法——似乎没有文档记录…
import catboost as cb
from google.cloud import storage
storage_client = storage.Client()
bucket_name = "catboost-models" # put your bucket name here
blob_name = "mymodel" # put the blob name from the bucket here
blob = storage_client.bucket( bucket_name ).blob( blob_name ).download_as_bytes()
model = cb.CatBoostClassifier()
model.load_model( blob = blob )
我使用Ture Friese对以下问题的回答来解决这个问题:如何从Google Cloud Storage桶中加载保存在joblib文件中的模型
这涉及到使用BytesIO将文件下载到内存中的文件对象中,然后从该文件对象加载模型,并使用它对数据帧X_eval:
进行预测。from io import BytesIO
storage_client = storage.Client()
# Storage variables
model_bucket_id = #Replace with your bucket ID
model_bucket = storage_client.get_bucket(model_bucket_id)
model_name = #Replace with the file name of the model
# Select bucket file
blob = model_bucket.blob(model_name)
# Download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_filename(model_file)
# Load model from in-memory file object
from_file = CatBoostClassifier()
model = from_file.load_model(model_name)
model.predict(X_eval)