SageMaker中的培训工作在将S3中的文件定位到Docker Image Path时给出了错误



我正在尝试使用scikit_bring_your_own/container/decision_trees/train mode,在AWS CLI中运行,我没有问题。试图复制创建SageMaker培训工作,面临从S3加载数据到Docker Image Path的问题。

在CLI命令中,我们使用的是从输入需要转介的位置指定docker run -v $(pwd)/test_dir:/opt/ml --rm ${image} train

在培训工作中,提到了模型伪像的S3存储桶位置和输出路径。

Error entered in the exception as in train - "container/decision_trees/train"
raise ValueError(('There are no files in {}.n' + 
'This usually indicates that the channel ({}) was incorrectly specified,n'  + 
'the data specification in S3 was incorrectly specified or the role specifiedn' +
'does not have permission to access the data.').format(training_path, channel_name))
Traceback (most recent call last):
File "/opt/program/train", line 55, in train
'does not have permission to access the data.').format(training_path, channel_name)) 

因此,不了解任何需要进行任何调整或缺少任何访问。

请帮助

如果您在createTrainingjob api中设置了

中的inputdataConfig
"InputDataConfig": [ 
  { 
     "ChannelName": "train",
     "DataSource": { 
        "S3DataSource": { 
           "S3DataDistributionType": "FullyReplicated",
           "S3DataType": "S3Prefix",
           "S3Uri": "s3://<bucket>/a.csv"
        }
     },
     "InputMode": "File",
  },
  { 
     "ChannelName": "eval",
     "DataSource": { 
        "S3DataSource": { 
           "S3DataDistributionType": "FullyReplicated",
           "S3DataType": "S3Prefix",
           "S3Uri": "s3://<bucket>/b.csv"
        }
     },
     "InputMode": "File",
  }
]

sagemaker下载上述数据从S3下载到Docker容器中的/opt/ml/input/data/channel_name目录。在这种情况下,算法容器应能够在

下找到输入数据
/opt/ml/input/data/train/a.csv
/opt/ml/input/data/eval/b.csv

您可以在https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-algorithms-training-algo.html

中找到更多详细信息。

最新更新