此问题的后续问题。我使用的是Kedro v0.18.2。我正在尝试使用TemplateConfig
,所以我在conf/base
下创建了一个globals.yml
,看起来像这样:
paths:
base_path: s3://my_project
datasets:
pdf: base.PDFDataSet
png: pillow.ImageDataSet
csv: pandas.CSVDataSet
excel: pandas.ExcelDataSet
data_folders:
raw: 01_raw
intermediate: 02_intermediate
primary: 03_primary
feature: 04_feature
model_input: 05_model_input
models: 06_models
model_output: 07_model_output
reporting: 08_reporting
我遵循了文档,并取消了一些settings.py
的注释:
"""Project settings. There is no need to edit this file unless you want to change values
from the Kedro defaults. For further information, including these default values, see
https://kedro.readthedocs.io/en/stable/kedro_project_setup/settings.html."""
# Instantiated project hooks.
# from certifai.hooks import ProjectHooks
# HOOKS = (ProjectHooks(),)
# Installed plugins for which to disable hook auto-registration.
# DISABLE_HOOKS_FOR_PLUGINS = ("kedro-viz",)
# Class that manages storing KedroSession data.
# from kedro.framework.session.store import ShelveStore
# SESSION_STORE_CLASS = ShelveStore
# Keyword arguments to pass to the `SESSION_STORE_CLASS` constructor.
# SESSION_STORE_ARGS = {
# "path": "./sessions"
# }
# Class that manages Kedro's library components.
# from kedro.framework.context import KedroContext
# CONTEXT_CLASS = KedroContext
# Directory that holds configuration.
# CONF_SOURCE = "conf"
# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
"globals_pattern": "*globals.yml",
}
# Class that manages the Data Catalog.
# from kedro.io import DataCatalog
# DATA_CATALOG_CLASS = DataCatalog
catalog.yml
看起来像这样:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset: ${datasets.png}
label_images_png:
<<: *label_images
filename_suffix: .png
label_images_jpg:
<<: *label_images
filename_suffix: .jpg
label_images_jpeg:
<<: *label_images
filename_suffix: .jpeg
label_images_pdf:
<<: *label_images
dataset: base.PDFDataSet
filename_suffix: .pdf
my_project_label_extracts:
type: PartitionedDataSet
path: s3://my_project/data/01_raw/label_extracts
dataset: pandas.ExcelDataSet
我的测试脚本如下:
from kedro.config import ConfigLoader
from kedro.framework.project import settings
from pathlib import Path
from kedro.extras.datasets import pillow
project_path = Path(__file__).parent.parent.parent
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="base")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
images_dataset = pillow.ImageDataSet.from_config("label_images_png", conf_catalog["label_images_png"])
images_loader = images_dataset.load()
images_loader["00337180800086"]().show()
使用catalog.yml
中的硬编码值,脚本运行并输出图像。但是,使用模板配置,它不起作用。我是不是错过了什么?
附言:如果问题重复,我们深表歉意。
我注意到的第一个错误是在您的目录中的条目:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset: ${datasets.png}
您错过了数据集的类型键。正确的条目应该是:
_label_images: &label_images
type: PartitionedDataSet
path: ${paths.base_path}/data/${data_folders.raw}/label_images
dataset:
type: ${datasets.png}
如果您现在使用TemplatedConfigLoader
运行脚本,那么您应该不会再收到上述错误:
from kedro.config import ConfigLoader, TemplatedConfigLoader
from kedro.framework.project import settings
from pathlib import Path
from kedro.extras.datasets import pillow
project_path = Path(__file__).parent.parent.parent
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = TemplatedConfigLoader(conf_source=conf_path, env="base", globals_pattern="*globals.yml")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
images_dataset = pillow.ImageDataSet.from_config("label_images_png", conf_catalog["label_images_png"])
images_loader = images_dataset.load()
images_loader["00337180800086"]().show()
为了便于沟通,您可能想加入Kedro Discord频道,以便我们可以实时回复您:https://discord.gg/akJDeVaxnB