如何使用pandas数据框架从SageMaker端点进行预测?



所以我试图使用我在SageMaker Studio中使用自动驾驶仪创建的模型,但我不断得到不同的错误。最终我想让它变得简单;取一个数据框,并使用该数据框(显然是pandas)预测输出。以下是到目前为止我所得到的,然后是我得到的错误。

import sagemaker, boto3, os
bucket = sagemaker.Session().default_bucket()
model = sagemaker.predictor.Predictor('Predict-Low', sagemaker_session=sagemaker.Session())
df = pd.read_csv('s3://sagemaker-studio-xxx/Sagemaker Data Predict Low.csv')
y = df['Low']
del df['Low']
y_hat = model.predict(df)
---------------------------------------------------------------------------
ParamValidationError                      Traceback (most recent call last)
<ipython-input-43-18ff980cf441> in <module>
----> 1 y_hat = model.predict(df)
/opt/conda/lib/python3.7/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
134             data, initial_args, target_model, target_variant, inference_id
135         )
--> 136         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
137         return self._handle_response(response)
138 
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
384                     "%s() only accepts keyword arguments." % py_operation_name)
385             # The "self" in this scope is referring to the BaseClient.
--> 386             return self._make_api_call(operation_name, kwargs)
387 
388         _api_call.__name__ = str(py_operation_name)
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
676         }
677         request_dict = self._convert_to_request_dict(
--> 678             api_params, operation_model, context=request_context)
679 
680         service_id = self._service_model.service_id.hyphenize()
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _convert_to_request_dict(self, api_params, operation_model, context)
724             api_params, operation_model, context)
725         request_dict = self._serializer.serialize_to_request(
--> 726             api_params, operation_model)
727         if not self._client_config.inject_host_prefix:
728             request_dict.pop('host_prefix', None)
/opt/conda/lib/python3.7/site-packages/botocore/validate.py in serialize_to_request(self, parameters, operation_model)
317                                                     operation_model.input_shape)
318             if report.has_errors():
--> 319                 raise ParamValidationError(report=report.generate_report())
320         return self._serializer.serialize_to_request(parameters,
321                                                      operation_model)
ParamValidationError: Parameter validation failed:
Invalid type for parameter Body

对我来说,它似乎需要一个字节串来做预测,所以这就是我所做的。我将数据帧转换为字节串,但仍然得到一个错误。有人知道我哪里做错了吗?

顺便说一下,这些都是在SageMaker Studio中完成的。这是数据。

Date         Company   High    Low  Open  Close  Volume  Adj Close  
0    7/13/2020    LIFE  4.380  3.880  4.21   3.88   62400       3.88   
1    7/14/2020    LIFE  4.210  3.721  3.95   4.16   80800       4.16   
2    7/15/2020    LIFE  4.550  4.053  4.17   4.50  212500       4.50   
3    7/16/2020    LIFE  4.550  4.350  4.40   4.51   44600       4.51   
4    7/17/2020    LIFE  5.170  4.410  4.54   5.09  257700       5.09   
..         ...     ...    ...    ...   ...    ...     ...        ...   
255  7/16/2021    LIFE  4.590  4.440  4.46   4.50  156300       4.50   
256  7/19/2021    LIFE  4.490  4.220  4.36   4.22  211700       4.22   
257  7/20/2021    LIFE  4.546  4.230  4.23   4.47  212500       4.47   
258  7/21/2021    LIFE  4.800  4.369  4.45   4.48  487500       4.48   
259  7/22/2021    LIFE  4.510  4.260  4.44   4.45  235200       4.45   
Sector                                          Specifics  
0    Health Care  Biotechnology: Biological Products (No Diagnos...   
1    Health Care  Biotechnology: Biological Products (No Diagnos...   
2    Health Care  Biotechnology: Biological Products (No Diagnos...   
3    Health Care  Biotechnology: Biological Products (No Diagnos...   
4    Health Care  Biotechnology: Biological Products (No Diagnos...   
..           ...                                                ...   
255  Health Care  Biotechnology: Biological Products (No Diagnos...   
256  Health Care  Biotechnology: Biological Products (No Diagnos...   
257  Health Care  Biotechnology: Biological Products (No Diagnos...   
258  Health Care  Biotechnology: Biological Products (No Diagnos...   
259  Health Care  Biotechnology: Biological Products (No Diagnos...   
Open Difference from Yesterday  Yesterday Open to Low  
0                              0.00                  0.000   
1                             -0.26                  0.330   
2                              0.22                  0.229   
3                              0.23                  0.117   
4                              0.14                  0.050   
..                              ...                    ...   
255                            0.01                  0.080   
256                           -0.10                  0.020   
257                           -0.13                  0.140   
258                            0.22                  0.000   
259                           -0.01                  0.081   
Yesterday Open to High  Yesterday Open to Adj Close  
0                     0.000                         0.00  
1                     0.170                        -0.33  
2                     0.260                         0.21  
3                     0.380                         0.33  
4                     0.150                         0.11  
..                      ...                          ...  
255                   0.100                         0.00  
256                   0.130                         0.04  
257                   0.130                        -0.14  
258                   0.316                         0.24  
259                   0.350                         0.03 

所以我发现您需要为您的模型指定一个序列化器以便进行预测。在model.predict(...)之前添加这段代码就可以了。

from sagemaker.serializers import CSVSerializer
model.serializer = CSVSerializer()

最新更新