使用嵌套字典拆分字符串并转换为Dataframe



我有这个数据问题。

CSV 的第一行

{'grade1': '47.614465', 'grade2': '-122.32174', 'grade3': '{"addr": "123 AV MOUNTIAN", "town": "HAMBOURG", "dep": GR", "code": ""}'}
{'grade1': '47.61699416', 'grade2': '-122.320405', 'grade3': '{"addr": "5555 WALL STREET", "town": "NY", "dep": "NY", "code": "98122"}'}
{'grade1': '47.61676902', 'grade2': '-122.3215492', 'grade3': '{"addr": "6776  SPAU - 65 ", "town": "GHAN", "dep": "IU", "code": "122"}'}

导入我的csv文件后,我得到了这个数据帧:

Grade
0   {'grade1': '47.614465', 'grade2': '-122.32174', 'grade3': '{"addr": "123 AV MOUNTIAN", "town": "HAMBOURG", "dep": GR", "code": ""}'}
1   {'grade1': '47.61699416', 'grade2': '-122.320405', 'grade3': '{"addr": "5555 WALL STREET", "town": "NY", "dep": "NY", "code": "98122"}'}
2   {'grade1': '47.61676902', 'grade2': '-122.3215492', 'grade3': '{"addr": "6776 SPAU - 65 ", "town": "GHAN", "dep": "IU", "code": "122"}'}

只有一列,数据类型是对象

我需要将其转换为数据帧,并获得此输出

grade1        grade2         addr              town         dep       code
47.614465     -122.32174    123 AV MOUNTIAN    HAMBOURG       GR            
47.61699416   -122.320405   5555 WALL STREET      NY          NY        98122 

我尝试了以下代码:

dic_loc=[]
#adress=[]
cordinates=[]
address=[]
for key, value in df['grade'][:3].items():  
print (key,value, type(value), pd.Series(value), type(pd.Series(value)))
dic_loc.append(value)  ### I get a string

结果是:

{'grade1': '47.614465', 'grade2': '-122.32174', 'grade3': '{"addr": "123 AV MOUNTIAN", "town": "HAMBOURG", "dep": GR", "code": ""}'} <class 'str'> 0    

问题是如何迭代这个字符串并将其转换为DataFrame?

欢迎任何想法帮助真的很感激

我认为可以通过以下方式修复:

import json, pandas
def fix_line(line):
# first convert the string to proper JSON
json_string = line.replace("'",'"').replace('"{', '{').replace('}"', '}')
# convert JSON to dict
d = json.loads(json_string)
# convert dict to a tuple 
return (float(d['grade1']), float(d['grade2']), d['grade3']['addr'],
d['grade3']['town'], d['grade3']['dep'], d['grade3']['code'])
# create a dataframe from a list of tuples
df = pandas.DataFrame.from_records([fix_line(line) for line in df['Grade']], 
columns=['grade1', 'grade2', 'addr', 'town', 'dep', 'code'])
print(df)

最新更新