我想用DataFrame Values创建一个嵌套字典:
输入
dfdict={'country': {0: 'USA', 1: 'USA', 2: 'USA', 3: 'USA'},
'state': {0: 'California', 1: 'California', 2: 'Texas', 3: 'Texas'},
'city': {0: 'San Francisco', 1: 'Los Angeles', 2: 'Dallas', 3: 'Houston'},
'attribut a': {0: 87, 1: 57, 2: 1, 3: 138},
'attribute b': {0: 19, 1: 13, 2: 134, 3: 101},
'attribute c': {0: 39, 1: 118, 2: 82, 3: 29}}
df=pd.DataFrame(dfdict)
country state city attribut a attribute b attribute c
0 USA California San Francisco 87 19 39
1 USA California Los Angeles 57 13 118
2 USA Texas Dallas 1 134 82
3 USA Texas Houston 138 101 29
预期输出:
defdict={"USA":{"California":{"San Francisco":{"atribute a":87,
"attribute b":19,
"attribute c":39},
"Los Angeles":{"atribute a":57,
"attribute b":13,
"attribute c":118}},
"Texas":{"Dallas":{"Dallas":{"atribute a":1,
"attribute b":134,
"attribute c":82},
"Houston":{"atribute a":138,
"attribute b":101,
"attribute c":29}
}
}
}
}
不幸的是,我所做的每次尝试都返回一个错误。即使从最简单的开始,如:
dictp=df[["country","state"]].apply(lambda x: {a:b for a,b in x}, axis=1)
正确的方法是什么?
这是一个三层嵌套的for循环,它可以做你想做的事情,至少可以成为进一步优化的起点。我将.tolist()
放入最内层循环中,以防该城市有多个条目。
outs={}
for i, c in df.groupby('country'):
if outs.get(i) is None:
outs[i] = {}
for j, s in c.groupby('state'):
if outs[i].get(j) is None:
outs[i][j] = {}
for k, city in s.groupby('city'):
outs[i][j][k] = {
col: city[col].tolist() for col in city.columns
}
假设数据中没有任何重复(因为预期的输出只包含字典)。使用groupby
和iterrows
,您可以得到所需的输出如下:
import pandas as pd
#loading df
dfdict={'country': {0: 'USA', 1: 'USA', 2: 'USA', 3: 'USA'},
'state': {0: 'California', 1: 'California', 2: 'Texas', 3: 'Texas'},
'city': {0: 'San Francisco', 1: 'Los Angeles', 2: 'Dallas', 3: 'Houston'},
'attribute a': {0: 87, 1: 57, 2: 1, 3: 138},
'attribute b': {0: 19, 1: 13, 2: 134, 3: 101},
'attribute c': {0: 39, 1: 118, 2: 82, 3: 29}}
df=pd.DataFrame(dfdict)
#See below code
# created group based on country>state>city and then used sum() to obtain
# dataframe groupings
groups = df.groupby(['country','state','city']).sum()
d = {}
#simple iteration over data
for r,grp in groups.iterrows():
key = grp.name
country = key[0]
state = key[1]
city = key[2]
attribute_a = grp['attribute a']
attribute_b = grp['attribute b']
attribute_c = grp['attribute c']
if country not in d:
d[country] = {}
if state not in d[country]:
d[country][state] = {}
if city not in d[country][state]:
d[country][state][city] = {
'attribute a': attribute_a,
'attribute b': attribute_b,
'attribute c': attribute_c
}
输出:
d =
{
"USA":{
"California":{
"Los Angeles":{
"attribute a":57,
"attribute b":13,
"attribute c":118
},
"San Francisco":{
"attribute a":87,
"attribute b":19,
"attribute c":39
}
},
"Texas":{
"Dallas":{
"attribute a":1,
"attribute b":134,
"attribute c":82
},
"Houston":{
"attribute a":138,
"attribute b":101,
"attribute c":29
}
}
}
}