从本地机器启动、监控和定义SageMaker处理作业的脚本



我正在研究这个问题,这很有意义。让我们关注一下这段代码:

from sagemaker.processing import ProcessingInput, ProcessingOutput
sklearn_processor.run(
code="preprocessing.py",
inputs=[
ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"),
],
outputs=[
ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
],
arguments=["--train-test-split-ratio", "0.2"],
)
preprocessing_job_description = sklearn_processor.jobs[-1].describe() 

在这里,预处理.py必须在云中。我很好奇,是否也可以将脚本放入S3桶中,并远程触发作业。我可以通过超参数优化轻松做到这一点,尽管我使用OOTB训练图像,但这不需要专用脚本。

在这种情况下,我可以这样解雇工作:

tuning_job_name = "amazing-hpo-job-" + strftime("%d-%H-%M-%S", gmtime())
smclient = boto3.Session().client("sagemaker")
smclient.create_hyper_parameter_tuning_job(
HyperParameterTuningJobName=tuning_job_name,
HyperParameterTuningJobConfig=tuning_job_config,
TrainingJobDefinition=training_job_definition
)

然后监控作业的进度:

smclient = boto3.Session().client("sagemaker")
tuning_job_result = smclient.describe_hyper_parameter_tuning_job(
HyperParameterTuningJobName=tuning_job_name
)
status = tuning_job_result["HyperParameterTuningJobStatus"]
if status != "Completed":
print("Reminder: the tuning job has not been completed.")
job_count = tuning_job_result["TrainingJobStatusCounters"]["Completed"]
print("%d training jobs have completed" % job_count)
objective = tuning_job_result["HyperParameterTuningJobConfig"]["HyperParameterTuningJobObjective"]
is_minimize = objective["Type"] != "Maximize"
objective_name = objective["MetricName"]
if tuning_job_result.get("BestTrainingJob", None):
print("Best model found so far:")
pprint(tuning_job_result["BestTrainingJob"])
else:
print("No training jobs have reported results yet.") 

我认为从本地机器启动和监控SageMaker处理作业应该与HPO作业一样可行,但脚本呢?理想情况下,我想在本地开发和测试它们,并远程运行。希望这有道理?

我不确定我是否理解与Tuning Job的比较。

根据您所描述的内容,在本例中,preprocessing.py实际上存储在本地。SageMaker SDK会将其上传到S3,以便远程处理作业访问它。我建议启动作业,然后查看SageMaker控制台中的输入。

如果您想在本地测试处理作业,可以使用本地模式进行测试。这基本上是在本地模仿Job,它有助于在启动远程处理Job之前调试脚本。请注意,码头工人需要使用本地模式。

本地模式示例代码:

from sagemaker.local import LocalSession
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}
# For local training a dummy role will be sufficient
role = 'arn:aws:iam::111111111111:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001'
processor = ScriptProcessor(command=['python3'],
image_uri='sagemaker-scikit-learn-processing-local',
role=role,
instance_count=1,
instance_type='local')
processor.run(code='processing_script.py',
inputs=[ProcessingInput(
source='./input_data/',
destination='/opt/ml/processing/input_data/')],
outputs=[ProcessingOutput(
output_name='word_count_data',
source='/opt/ml/processing/processed_data/')],
arguments=['job-type', 'word-count']
)
preprocessing_job_description = processor.jobs[-1].describe()
output_config = preprocessing_job_description['ProcessingOutputConfig']
print(output_config)
for output in output_config['Outputs']:
if output['OutputName'] == 'word_count_data':
word_count_data_file = output['S3Output']['S3Uri']
print('Output file is located on: {}'.format(word_count_data_file))

相关内容

最新更新