验证后如何给Pydantic模型的字段分配不同类型的值?



我有一个csv文件,有youtube的url和它的时间戳。

https://www.youtube.com/watch?v=dsnLcaNhXd6o,0:13-0:20;0:25-0:31;0:36-0:40
https://www.youtube.com/watch?v=d8InLcaNhXd6o,0:43-0:52;0:56-1:07
https://www.youtube.com/watch?v=Inji8LcaNhXd6o,0:13-0:20;0:25-0:31;0:36-0:40;0:43-0:52;0:56-1:07;1:15-1:25;1:28-1:40

我需要将csv文件转换为pydantic对象,以便我可以验证csv文件并将其传递给执行某些过程。

with open(csv_file, mode ='r') as file:
csvFile = csv.reader(file)
csvList = list(enumerate(csvFile))

我有以下Pydantic模型:

class TimeStamp(BaseModel):
start_min: int
start_sec: int
end_min: int
end_sec: int
class VideoDetail(BaseModel):
row_index: int
url: str
timestamps: List[TimeStamp]
class VideoList(BaseModel):
entry: List[VideoDetail]

现在我需要将csvList传递给VideoList模型并执行一些验证并获得VideoList对象。

首先,list(enumerate(csvFile))将返回tuplesrow indexrowlist

example:

csvList = list(enumerate(csvFile))
print(csvList)

output:

[
(0, "https://www.youtube.com/watch?v=dsnLcaNhXd6o","0:13-0:20;0:25-0:31;0:36-0:40"),
(1, "https://www.youtube.com/watch?v=d8InLcaNhXd6o","0:43-0:52;0:56-1:07"),
(2, "https://www.youtube.com/watch?v=d8InLcaNhXd6o","0:43-0:52;0:56-1:07")
]

现在,当我将csvList传递给VideoList模型时,timestamp将作为字符串传递。但是我如何将它传递到TimeStamp对象列表中呢?

我试图在VideoDetail模型中向timestamp字段添加验证器,并将字符串拆分为时间戳列表,然后返回它。但是它不会工作,因为它会抛出一个错误,因为timestamp的类型不匹配。

基本上,您必须将时间戳字符串分割成几个部分,以提供给pydantic模型的各个变量:

我使用验证器函数来做同样的事情。验证器中的pre=True确保在赋值之前运行此函数。在验证器函数中:-

  1. 时间戳字符串(ex 0:43-0:52;0:56-1:07)首先被;分割获取时间戳字符串列表。
  2. 然后循环遍历每个这样的时间戳字符串(例如0:43-0:52)并按-分割每个时间戳字符串获取开始时间和结束时间
  3. 最后,它将每个开始时间和结束时间(ex 0:43)分割为:,将每个转换为整数,并添加到列表

(我用字典代替元组。你可以使用元组)

class TimeStamp(BaseModel):
start_min: int
start_sec: int
end_min: int
end_sec: int
class VideoDetail(BaseModel):
row_index: int
url: str
timestamps: List[TimeStamp]

@validator("timestamps", pre=True)
def createTimestamps(cls, value):
timestampslist = []
if isinstance(value, str):
timestamplist_str = value.split(";")
for eachTimestamp in timestamplist_str:
start_time_str, end_time_str = eachTimestamp.split("-")
t = TimeStamp(start_min=int(start_time_str.split(":")[0]),
start_sec = int(start_time_str.split(":")[1]),
end_min=int(end_time_str.split(":")[0]),
end_sec = int(end_time_str.split(":")[1]))
timestampslist.append(t)
return timestampslist
class VideoList(BaseModel):
entry: List[VideoDetail]

csvList = [
{"row_index":0, "url": "https://www.youtube.com/watch?v=dsnLcaNhXd6o", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
{"row_index":0, "url": "https://www.youtube.com/watch?v=wcsnLcad6d", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
{"row_index":0, "url": "https://www.youtube.com/watch?v=LcdshXe6o", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
]
vs = VideoList(entry=csvList)