从JSON读取数据时出现Pydantic错误



我正在编写代码,它加载JSON文件的数据并使用Pydantic对其进行解析。

下面是Python代码:

import json
import pydantic
from typing import Optional, List

class Car(pydantic.BaseModel):
manufacturer: str
model: str
date_of_manufacture: str
date_of_sale: str
number_plate: str
price: float
type_of_fuel: Optional[str]
location_of_sale: Optional[str]

def load_data() -> None:
with open("./data.json") as file:
data = json.load(file)
cars: List[Car] = [Car(**item) for item in data]
print(cars[0])

if __name__ == "__main__":
load_data()

JSON数据:

[
{
"manufacturer": "BMW",
"model": "i8",
"date_of_manufacture": "14/06/2021",
"date_of_sale": "19/11/2022",
"number_plate": "ND21WHP",
"price": "100,000",
"type_of_fuel": "electric",
"location_of_sale": "Leicester, England"
},
{
"manufacturer": "Audi",
"model": "TT RS",
"date_of_manufacture": "22/02/2019",
"date_of_sale": "12/08/2021",
"number_plate": "LR69FOW",
"price": "67,000",
"type_of_fuel": "petrol",
"location_of_sale": "Manchester, England"
}
]

我得到的错误是:

pydantic.main. basemodel .__init__ .py文件"pydanticmain.py",第342行pydantic.error_wrappers。ValidationError: Car的验证错误价格值不是一个有效的float (type=type_error.float)

我已经尝试将.00添加到价格字符串的末尾,但我得到相同的错误。

问题来自于float的默认Pydantic验证器只是试图将字符串值强制转换为float(如@Paul所提到的)。float("100,000")生成ValueError

我很惊讶没有人建议这样做,但如果你不控制源JSON数据,你可以通过编写自己的小验证器来正确格式化字符串(或自己正确解析数字)来轻松解决这个问题:

from pydantic import BaseModel, validator
class Car(BaseModel):
manufacturer: str
model: str
date_of_manufacture: str
date_of_sale: str
number_plate: str
price: float
type_of_fuel: Optional[str]
location_of_sale: Optional[str]
@validator("price", pre=True)
def adjust_number_format(cls, v: object) -> object:
if isinstance(v, str):
return v.replace(",", "")
return v

pre=True对于在默认字段验证器接收值之前进行调整非常重要。我故意这样做是为了表明您不需要自己将str转换为float,但您当然也可以这样做:

...
@validator("price", pre=True)
def parse_number(cls, v: object) -> object:
if isinstance(v, str):
return float(v.replace(",", ""))
return v

这两种方法都可以工作,并且不需要在JSON文档中进行任何更改。


最后,如果您有(或预计将来会有)多个类似数字的字段,并且知道所有这些字段都可能导致格式怪异的字符串出现这样的问题,那么您可以这样概括该验证器:(为演示目的而使用不同的类)

from pydantic import BaseModel, validator
from pydantic.fields import ModelField

class Car2(BaseModel):
model: str
price: float
year: int
numbers: list[float]
@validator("*", pre=True, each_item=True)
def format_number_string(cls, v: object, field: ModelField) -> object:
if issubclass(field.type_, (float, int)) and isinstance(v, str):
return v.replace(",", "")
return v

if __name__ == "__main__":
car = Car2.parse_obj({
"model": "foo",
"price": "100,000",
"year": "2,010",
"numbers": ["1", "3.14", "10,000"]
})
print(car)  # model='foo' price=100000.0 year=2010 numbers=[1.0, 3.14, 10000.0]

您也可以将十进制逗号,更改为_并保留字符串

Pydantic正在处理str到float的转换。

您需要删除数字周围的引号,因为它们被解释为字符串。

"price": "100,000"应该是:"price": 100000