我可以使用DataFrameSchema验证DataFrame索引,如下所示:
import pandera as pa
from pandera import Column, DataFrameSchema, Check, Index
schema = DataFrameSchema(
columns={
"column1": pa.Column(int),
},
index=pa.Index(int, name="index_name"),
)
# raises the error as expected
schema.validate(
pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name"))
)
有没有一种方法可以使用SchemaModel来做同样的事情?
您可以执行以下操作-
import pandera as pa
from pandera.typing import Index, Series
class Schema(pa.SchemaModel):
idx: Index[int] = pa.Field(ge=0, check_name=True)
column1: Series[int]
df = pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name"))
Schema.validate(df)
在GitHub 中找到答案
可以使用pa.typing.Index对索引进行类型注释。
class Schema(pa.SchemaModel):
column1: pa.typing.Series[int]
index_name: pa.typing.Index[int] = pa.Field(check_name=True)
查看如何验证MultiIndex索引:https://pandera.readthedocs.io/en/stable/schema_models.html#multiindex