哪台SageMaker服务器支持服务器端批处理,以及如何启用它



MMS、TFServing和TorchServe支持服务器端批处理(后续请求可以由服务器以异步方式本地批处理,同时对客户端保持同步批处理-1大小的假象(。如何在SageMaker端点上启用这些功能?

对于每个SageMaker容器,这些都可以通过环境变量进行控制。

对于TorchServe:

from sagemaker.pytorch.model import PyTorchModel
env_variables_dict = {
"SAGEMAKER_TS_BATCH_SIZE": "3",
"SAGEMAKER_TS_MAX_BATCH_DELAY": "100000"
}
pytorch_model = PyTorchModel(
model_data=model_artifact,
role=role,
image_uri=image_uri,
source_dir="code",
framework_version='1.9',
entry_point="inference.py",
env=env_variables_dict
)

工具包中的ENV:https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/27b667fa27259dcea92b97e3dcc903057587deb6/src/sagemaker_pytorch_serving_container/ts_parameters.py

有关详细信息的博客文章:https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/


T服务批处理文档:https://github.com/aws/sagemaker-tensorflow-serving-container/blob/1bd309b7be5040d5515a3081fd5714e444b2ab91/README.md#enabling-分批


MMS批处理功能目前在SageMaker中不受支持。

最新更新