Erro InvalidInputDatatype:Azure 不支持 'Unknown' 类型的输入 (azureml.train.automl)



我有一个pandas的DataFrame创建者:

TB_HISTORICO_MODELO = pd.read_sql("""select DAT_INICIO_SEMANA_PLAN
,COD_NEGOCIO
,VENDA
,LUCRO
,MODULADO
,RUPTURA
,QTD_ESTOQUE_MEDIO
,PECAS from TB""", cursor)
TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"] = pd.to_datetime(TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"])
dataset = TB_HISTORICO_MODELO[TB_HISTORICO_MODELO['COD_NEGOCIO']=='A101'].drop(columns=['COD_NEGOCIO']) .reset_index(drop=True)

一切看起来都很好。

>>> dataset.dtypes
DAT_INICIO_SEMANA_PLAN    datetime64[ns]
VENDA                            float64
LUCRO                            float64
MODULADO                           int64
RUPTURA                            int64
QTD_ESTOQUE_MEDIO                  int64
PECAS                            float64
dtype: object

但当我听到这个:

#%% Create the AutoML Config file and run the experiment on Azure
from azureml.train.automl import AutoMLConfig
time_series_settings = {
'time_column_name': 'DAT_INICIO_SEMANA_PLAN',
'max_horizon': 14,
'country_or_region': 'BR',
'target_lags': 'auto'
}
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
blocked_models=['ExtremeRandomTrees'],
experiment_timeout_minutes=30,
training_data=dataset,
label_column_name='VENDA',
compute_target = compute_cluster,
enable_early_stopping=True,
n_cross_validations=3,
# max_concurrent_iterations=4,
# max_cores_per_iteration=-1,
verbosity=logging.INFO,
**time_series_settings)
remote_run = Experimento.submit(automl_config, show_output=True)

我收到信息

>>> remote_run = Experimento.submit(automl_config, show_output=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/core/experiment.py", line 219, in submit
run = submit_func(config, self.workspace, self.name, **kwargs)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 92, in _automl_static_submit
automl_config_object._validate_config_settings(workspace)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 1775, in _validate_config_settings
supported_types=", ".join(SupportedInputDatatypes.REMOTE_RUN_SCENARIO)
azureml.train.automl.exceptions.ConfigException: ConfigException:
Message: Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]
InnerException: None
ErrorResponse 
{
"error": {
"code": "UserError",
"message": "Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]",
"details_uri": "https://aka.ms/AutoMLConfig",
"target": "training_data",
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "ArgumentInvalid",
"inner_error": {
"code": "InvalidInputDatatype"
}
}
}
}
}

哪里出了问题?

文件:https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-trainhttps://learn.microsoft.com/pt-br/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig

配置AutoML文档说明:

对于远程实验,必须可以从远程计算访问训练数据。在远程计算上工作时,AutoML仅接受Azure机器学习表格数据集。

看起来你的dataset对象是Pandas DataFrame,而它实际上应该是Azure MLDataset。查看此文档创建数据集。

最新更新