Amazon SageMaker如何预测新数据



我在Amazon SageMaker中构建了模型,代码附在下面。现在我希望能够将新数据上传到s3,并根据该模型进行预测,而不必每次都重新计算。

sess = sagemaker.Session()
bucket = "innogy-bda-germany-dev-landing-dc3-retailpl"
prefix = "sagemaker/xgboost-upsell"
role = get_execution_role()
container = sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "latest")
display(container)
train_path = 's3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/train.csv'
test_path = 's3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/validation.csv'
s3_input_train = sagemaker.TrainingInput(s3_data=train_path, content_type='csv')
s3_input_test = sagemaker.TrainingInput(s3_data=test_path, content_type='csv')
sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(
container,
role,
instance_count=1,
instance_type="ml.m5.4xlarge",
output_path="s3://innogy-bda-germany-dev-landing-dc3-retailpl/UPSELL/LIST/output",
sagemaker_session=sess,
)
xgb.set_hyperparameters(
alpha= 1.340343927865692,
colsample_bytree= 0.525162855476281,
eta= 0.06451533130134757,
gamma= 0.9683995477068462,
max_depth= 10,
min_child_weight= 3.851108988963441,
num_round= 987,
subsample= 0.8725573749114485,
silent=0,
objective="binary:logistic",
early_stopping_rounds=50,
)
xgb.fit({"train": s3_input_train, "validation": s3_input_validation})

我要求一个代码示例,如何从s3提取这个模型到一个新的笔记本电脑,现在用它来预测新的数据。

另外,我想知道为什么在sagemaker中使用内置的xgboost模型时不扔掉目标变量,因为在对新集合进行预测时,我将不知道目标。

train_data, validation_data, test_data = np.split(df_smote.sample(frac=1, random_state=1729),[int(0.7 * len(df_smote)), int(0.9 * len(df_smote))],)

您需要按照这里列出的步骤将模型部署到EC2实例,这样您就可以进行批处理或按需预测。https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-model-deployment.html

最新更新