为什么我不能使用tfdv. inter_schema()生成模式?



TypeError: statistics是StatsOptions类型,应该是一个DatasetFeatureStatisticsList原型。当我使用tfdv.infer_schema()选项生成模式时,错误显示,但当我使用tfdv过滤相关功能时,我无法做到。StatsOptions类使用feature_allowlist。有谁能帮我吗?

features_remove= {"region","fiscal_week"}
columns= [col for col in df.columns if col not in features_remove]
stat_Options= tfdv.StatsOptions(feature_allowlist=columns)
print(stat_Options.feature_allowlist)

schema= tfdv.infer_schema(stat_Options)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-e61b2454028e> in <module>
----> 1 schema= tfdv.infer_schema(stat_Options)
2 schema
C:ProgramDataAnaconda3libsite-packagestensorflow_data_validationapivalidation_api.py in infer_schema(statistics, infer_feature_shape, max_string_domain_size, schema_transformations)
95   """
96   if not isinstance(statistics, statistics_pb2.DatasetFeatureStatisticsList):
---> 97     raise TypeError(
98         'statistics is of type %s, should be '
99         'a DatasetFeatureStatisticsList proto.' % type(statistics).__name__)
TypeError: statistics is of type StatsOptions, should be a DatasetFeatureStatisticsList proto.

原因很简单,您必须传递一个statistics_pb2。DatasetFeatureStatisticsList对象到tfdv。

你应该这样做:

features_remove= {"region","fiscal_week"}
columns= [col for col in df.columns if col not in features_remove]
stat_Options= tfdv.StatsOptions(feature_allowlist=columns)
print(stat_Options.feature_allowlist)
stats = tfdv.generate_statistics_from_dataframe(df, stat_Options)
schema= tfdv.infer_schema(stats)

最新更新