在sagemaker中,能够从s3加载和部署模型。在反序列化数据以进行预测时,我收到"UnicodeDecodeError:'utf-8'编解码器无法解码位置 2 中的字节0xd7:无效的延续字节" "Results = predictor.predict(test_X(">
我尝试了以下圣人示例 https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/linear_time_series_forecast/linear_time_series_forecast.ipynb。我能够在 s3 中创建训练、验证和部署模型并存储模型。
在此之后,我想将模型从 s3 导入到 sagemaker 并使用导入的模型进行测试。能够加载和部署模型,但是在预测测试值时,我收到UnicodeDecodeError
from sagemaker.predictor import csv_serializer, json_deserializer
role = get_execution_role()
sagemaker_session = sagemaker.Session()
model_data = sagemaker.session.s3_input( model_file_location_in_s3, distribution='FullyReplicated', content_type='application/x-sagemaker-model', s3_data_type='S3Prefix')
sagemaker_model = sagemaker.LinearLearnerModel(model_data=model_file,
role=role,
sagemaker_session=sagemaker_session)
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
#loading test data
gas = pd.read_csv('gasoline.csv', header=None, names=['thousands_barrels'],encoding='utf-8')
gas['thousands_barrels_lag1'] = gas['thousands_barrels'].shift(1)
gas['thousands_barrels_lag2'] = gas['thousands_barrels'].shift(2)
gas['thousands_barrels_lag3'] = gas['thousands_barrels'].shift(3)
gas['thousands_barrels_lag4'] = gas['thousands_barrels'].shift(4)
gas['trend'] = np.arange(len(gas))
gas['log_trend'] = np.log1p(np.arange(len(gas)))
gas['sq_trend'] = np.arange(len(gas)) ** 2
weeks = pd.get_dummies(np.array(list(range(52)) * 15)[:len(gas)], prefix='week')
gas = pd.concat([gas, weeks], axis=1)
gas = gas.iloc[4:, ]
split_train = int(len(gas) * 0.6)
split_test = int(len(gas) * 0.3)
test_y = gas['thousands_barrels'][split_test:]
test_X = gas.drop('thousands_barrels', axis=1).iloc[split_test:, ].as_matrix()
predictor.content_type = 'text/csv'
predictor.serializer = csv_serializer
predictor.deserializer = json_deserializer
results = predictor.predict(test_X)
one_step = np.array([r['score'] for r in results['predictions']])
当训练和部署模型时,程序工作正常(如示例所示(,但是从 S3 加载时,它会抛出此错误。
测试数据是数字数组。
反序列化程序似乎不适合响应的内容。
要进行调查,请编写一个自定义反序列化程序,仅打印一些详细信息:
def debug_deserializer(data, content_type):
print(content_type)
print(data)
并像这样应用它:
predictor.deserializer = debug_deserializer
例如,这可以产生如下结果:
application/x-recordio-protobuf
<botocore.response.StreamingBody object at 0x7fd3544883c8>
None
告诉您内容类型是application/x-recordio-protobuf
.然后编写一个自定义反序列化程序,例如:
from sagemaker.amazon.common import RecordDeserializer
def recordio_protobuf_deserialize(data, content_type):
rec_des = RecordDeserializer()
return rec_des.deserialize(data, content_type)
并像以下方式应用:
predictor.deserializer = recordio_protobuf_deserialize