>我有一个CSV文件,看起来像:
Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944
我将其加载为pandas.Dataframe
命名:imageannotation
- 我有兴趣提取一个dictionary
,它具有key
imagename
(注意:图像名称可以有重复的行),并且value
,另一个dictionary
2个键:['bbox',, 'species']
,其中bbox
是由TL_x, TL_y, BR_x, BR_y
值给出的列表
我可以使用以下代码完成此操作:
test = {
i: {
"bbox": imageannotation[imageannotation["Imagename"] == i][
["TL_x", "TL_y", "BR_x", "BR_y"]
].values,
"species": imageannotation[imageannotation["Imagename"] == i][
["Species"]
].values,
}
for i in imageannotation["Imagename"].unique()
}
结果如下所示:
mydict = {'201503.20150619.181140817.204628': {'bbox': array([[272. , 142.375, 382.5 , 340. ]]),
'species': array([['fish']], dtype=object)},
'201503.20150619.181141498.204632': {'bbox': array([[267.75 , 6.375, 422.875, 80.75 ]]),
'species': array([['fish']], dtype=object)},
'201503.20150619.181141662.204633': {'bbox': array([[ 820.25 , 78.625, 973.25 , 382.5 ],
[1257. , 75. , 1280. , 116. ]]),
'species': array([['fish'],
['fish']], dtype=object)},
'201503.20150619.181141834.204634': {'bbox': array([[194., 281., 233., 336.],
[766., 271., 789., 293.]]),
'species': array([['fish'],
['fish']], dtype=object)}}
这就是我想要的,但在处理大文件时可能会变得非常慢。
问:您有更好的方法来实现这一点吗?
我的最终目标是向数据帧imagemetadata
添加一个新列,该列大于具有唯一值的 Imagename 字段 - 我执行最后一个操作:
for i in mydict:
imagemetadata.loc[imagemetadata.Imagename == i, "annotation"] = [test[I]]
(现在我重新阅读了东西,修改了答案。
这似乎是你所追求的;按Imagename对注释进行分组,从中制作一个列表字典,将它们映射到另一个数据帧中。
import io
import pandas as pd
imageannotation = pd.read_csv(
io.StringIO(
"""
Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944
"""
)
)
# (Pretend this comes from a separate file)
imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})
def make_annotation(r):
return {
"bbox": [r.TL_x, r.TL_y, r.BR_x, r.BR_y],
"species": r.Species,
}
annotations_by_image = (
imageannotation.groupby("Imagename")
.apply(lambda r: r.apply(make_annotation, axis=1).to_list())
.to_dict()
)
imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})
imagemetadata["annotation"] = imagemetadata.Imagename.map(annotations_by_image)
print(imagemetadata)
输出为
Imagename annotation
0 201503.20150619.181140817.204628.jpg [{'bbox': [272.0, 142.375, 382.5, 340.0], 'spe...
1 201503.20150619.181141498.204632.jpg [{'bbox': [267.75, 6.375, 422.875, 80.75], 'sp...
2 201503.20150619.181141662.204633.jpg [{'bbox': [820.25, 78.625, 973.25, 382.5], 'sp...
3 201503.20150619.181141834.204634.jpg [{'bbox': [194.0, 281.0, 233.0, 336.0], 'speci...
如果您希望imagemetadata
有多个条目时annotation
多行,
imagemetadata = imagemetadata.explode("annotation").reset_index(drop=True)
再次修订
对于列表字典而不是字典列表,它更简单:
# Generate a bbox column
imageannotation["bbox"] = imageannotation.apply(lambda x: [x.TL_x, x.TL_y, x.BR_x, x.BR_y], axis=1)
# Get the columns we want as a dict
annotations_by_image = imageannotation.groupby("Imagename").agg({"bbox": list, "Species": list}).to_dict("index")
# Apply the annotations to the other df
imagemetadata["annotation"] = imagemetadata.Imagename.map(annotations_by_image)
print(imagemetadata)
输出为
Imagename annotation
0 201503.20150619.181140817.204628.jpg {'bbox': [[272.0, 142.375, 382.5, 340.0]], 'Sp...
1 201503.20150619.181141498.204632.jpg {'bbox': [[267.75, 6.375, 422.875, 80.75]], 'S...
2 201503.20150619.181141662.204633.jpg {'bbox': [[820.25, 78.625, 973.25, 382.5], [12...
3 201503.20150619.181141834.204634.jpg {'bbox': [[194.0, 281.0, 233.0, 336.0]], 'Spec...