错误:表格数据集被视为图像数据集(顶点AI管道:自定义训练)



我使用Vertex AI Pipelines自定义训练表格数据。

  1. 我运行了下面的python代码
  2. 我用生成的json创建并运行管道
  3. 培训开始时出现以下错误

为什么将表格数据集视为图像数据集?怎么了?

环境

  • Python 3.7.3
    • kfp==1.6.2
    • kfp管道规格==0.1.7
    • kfp服务器api==1.6.0

错误消息

ValueError: ImageDataset class can not be used to retrieve dataset resource projects/nnnnnnnnnnnn/locations/us-central1/datasets/3781554739456507904, check the dataset type
f"{self.__class__.__name__} class can not be used to retrieve "
File "/opt/python3.7/lib/python3.7/site-packages/google/cloud/aiplatform/datasets/dataset.py", line 100, in _validate_metadata_schema_uri
self._validate_metadata_schema_uri()
File "/opt/python3.7/lib/python3.7/site-packages/google/cloud/aiplatform/datasets/dataset.py", line 82, in __init__
return annotation_type(value)
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/aiplatform/remote_runner.py", line 176, in cast
value = cast(value, param_type)
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/aiplatform/remote_runner.py", line 205, in prepare_parameters
prepare_parameters(serialized_args[METHOD_KEY], method, is_init=False)
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/aiplatform/remote_runner.py", line 236, in runner
print(runner(args.cls_name, args.method_name, executor_input, kwargs))
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/aiplatform/remote_runner.py", line 280, in main
main()
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/aiplatform/remote_runner.py", line 284, in <module>
exec(code, run_globals)
File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code
"__main__", mod_spec)
File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
Traceback (most recent call last):

Python代码:

import datetime
from kfp.v2 import dsl, compiler
from kfp.v2.google.client import AIPlatformClient
import google_cloud_pipeline_components.aiplatform as gcc_ai
PROJECT = "my-project"
PIPELINE_NAME = "test-pipeline"
PIPELINE_ROOT_PATH = f"gs://test-pipeline-20210525/{PIPELINE_NAME}"
@dsl.pipeline(
name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT_PATH
)
def test_pipeline(
display_name: str=f"{PIPELINE_NAME}-2021MMDD-nn"
):
dataset_create_op = gcc_ai.TabularDatasetCreateOp(
project=PROJECT, display_name=display_name,
gcs_source="gs://used_apartment/datasource/train.csv"
)
training_job_run_op = gcc_ai.CustomContainerTrainingJobRunOp(
project=PROJECT, display_name=display_name,
container_uri="us-central1-docker.pkg.dev/my-project/dataops-rc2021/custom-train:latest",
staging_bucket="vertex_ai_staging_rc2021",
base_output_dir="gs://used_apartment/cstm_img_scrf/artifact",
model_serving_container_image_uri="us-central1-docker.pkg.dev/my-project/dataops-rc2021/custom-pred:latest",
model_serving_container_predict_route="/",
model_serving_container_health_route="/health",
model_serving_container_ports=[8080],
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
dataset=dataset_create_op.outputs["dataset"]
)
def run_pipeline(event=None, context=None):
# Compile the pipeline using the kfp.v2.compiler.Compiler
compiler.Compiler().compile(
pipeline_func=test_pipeline,
package_path="test-pipeline.json"
)
if __name__ == '__main__':
run_pipeline()

这似乎是CustomContainerTrainingJobRunOp组件代码中的一个错误。我们能够重现这个错误。

我已经创建了跟踪错误https://github.com/kubeflow/pipelines/issues/5885.

最新更新