我的csv文件:
FILE_INFO, CATEGORY, AREA, BOX, NAME
"{'id': 1, 'width': 4032, 'height': 3024, 'file_name': 'pic1.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",PRODUCT,2247.8981,"[2283.54, 934.13, 27.37, 82.13]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 2, 'width': 4032, 'height': 3024, 'file_name': 'pic2.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",PRODUCT,2450.7795,"[2239.91, 1284.21, 33.15, 73.93]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 3, 'width': 4032, 'height': 3024, 'file_name': 'pic3.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",INDUSTRIAL litter,2548.956,"[2316.07, 301.5, 68.3, 37.32]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 4, 'width': 4032, 'height': 3024, 'file_name': 'pic4.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",INDUSTRIAL litter,1465.0172,"[3394.37, 1083.97, 26.99, 54.28]","{'subcategory': 'PAPER', 'occluded': False}"
如何解析FILE_INFO获取file_name表格中不包含任何其他信息。与NAME相同列,只得到子类别从它。其他的表也不错。
可以遍历for循环中的值,并使用JSON提取所需的数据。
那么在for循环中你可以这样做:
import json
for row in rows:
json.loads(row.replace("'", """))['file_name']
主要的解决方法是JSON列不是很好地形成与json.loads()
- 将单引号替换为双引号,使其格式良好
json.loads()
转换为dict
apply(pd.Series)
扩展字典到列
现在你有了一个简单的导航数据框架。
df = pd.read_csv(io.StringIO('''FILE_INFO, CATEGORY, AREA, BOX, NAME
"{'id': 1, 'width': 4032, 'height': 3024, 'file_name': 'pic1.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",PRODUCT,2247.8981,"[2283.54, 934.13, 27.37, 82.13]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 2, 'width': 4032, 'height': 3024, 'file_name': 'pic2.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",PRODUCT,2450.7795,"[2239.91, 1284.21, 33.15, 73.93]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 3, 'width': 4032, 'height': 3024, 'file_name': 'pic3.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",INDUSTRIAL litter,2548.956,"[2316.07, 301.5, 68.3, 37.32]","{'subcategory': 'BOTTLE', 'occluded': False}"
"{'id': 4, 'width': 4032, 'height': 3024, 'file_name': 'pic4.jpeg', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}",INDUSTRIAL litter,1465.0172,"[3394.37, 1083.97, 26.99, 54.28]","{'subcategory': 'PAPER', 'occluded': False}"'''))
df = df.drop(columns="FILE_INFO").join(df.FILE_INFO.apply(lambda x: json.loads(x.replace("'","""))).apply(pd.Series))
date_capturedPRODUCT [2283.54, 934.13, 27.37, 82.13] {'subcategory': 'BOTTLE', 'occluded': False} 0 PRODUCT2450.78 [2239.91, 1284.21, 33.15, 73.93] {'subcategory': 'BOTTLE', 'occluded': False} 0 工业垃圾2548.96 [2316.07, 301.5, 68.3, 37.32] {'subcategory': 'BOTTLE', 'occluded': False} 0 工业垃圾1465.02 [3394.37, 1083.97, 26.99, 54.28] {'subcategory': 'PAPER', 'occluded': False} 0