python call to boto3.client.create_data_source_from_s3



我正在尝试从Python项目中使用AWS机器学习批处理过程。我正在使用boto3。我在响应中收到此失败消息。

试图解析架构的错误:'不能绝望 boolean的实例来自start_array代币 n [来源: java.io.stringreader@60618eb4;线:1,列:2](通过参考 链: com.amazon.eml.dp.recordset.schemapojo [" datafilecontainsheader"])

我正在使用的.csv文件。我知道这是因为它通过控制台流程起作用。

这是我的代码;它是Django模型中的一个函数,该函数将URL保存到要处理的文件(input_file):

    def create_data_source_from_s3(self):
        attributes = []
        attribute = { "fieldName": "Var1", "fieldType": "CATEGORICAL" }
        attributes.append(attribute)
        attribute = { "fieldName": "Var2", "fieldType": "CATEGORICAL" }
        attributes.append(attribute)
        attribute = { "fieldName": "Var3", "fieldType": "NUMERIC" }
        attributes.append(attribute)
        attribute = { "fieldName": "Var4", "fieldType": "CATEGORICAL" }
        attributes.append(attribute)
        attribute = { "fieldName": "Var5", "fieldType": "CATEGORICAL" }
        attributes.append(attribute)
        attribute = { "fieldName": "Var6", "fieldType": "CATEGORICAL" }
        attributes.append(attribute)
        dataSchema = {}
        dataSchema['version'] = '1.0'
        dataSchema['dataFormat'] = 'CSV'
        dataSchema['attributes'] = attributes
        dataSchema["targetFieldName"] = "Var6"
        dataSchema["dataFileContainsHeader"] = True,
        json_data = json.dumps(dataSchema)
        client = boto3.client('machinelearning', region_name=settings.region, aws_access_key_id=settings.aws_access_key_id, aws_secret_access_key=settings.aws_secret_access_key)
        #create a datasource
        return client.create_data_source_from_s3(
            DataSourceId=self.input_file.name,
            DataSourceName=self.input_file.name,
            DataSpec={
                'DataLocationS3': 's3://' + settings.AWS_S3_BUCKET_NAME + '/' + self.input_file.name,
                'DataSchema': json_data,
            },
            ComputeStatistics=True
            )

有什么想法我在做什么?

删除逗号

  dataSchema["dataFileContainsHeader"] = True,

这使Python认为您正在添加元组。因此,您的dataschema实际上包含(true,)

,您的输出看起来像这样

{"dataFileContainsHeader": [true], "attributes": [{"fieldName": "Var1", "fieldType": "CATEGORICAL"}, {"fieldName": "Var2", "fieldType": "CATEGORICAL"}, {"fieldName": "Var3", "fieldType": "NUMERIC"}, {"fieldName": "Var4", "fieldType": "CATEGORICAL"}, {"fieldName": "Var5", "fieldType": "CATEGORICAL"}, {"fieldName": "Var6", "fieldType": "CATEGORICAL"}], "version": "1.0", "dataFormat": "CSV", "targetFieldName": "Var6"}

aws相反期望这样的东西

"dataFileContainsHeader": true

相关内容

最新更新