从数据框到嵌套字典



我想用DataFrame Values创建一个嵌套字典:

输入

dfdict={'country': {0: 'USA', 1: 'USA', 2: 'USA', 3: 'USA'},
'state': {0: 'California', 1: 'California', 2: 'Texas', 3: 'Texas'},
'city': {0: 'San Francisco', 1: 'Los Angeles', 2: 'Dallas', 3: 'Houston'},
'attribut a': {0: 87, 1: 57, 2: 1, 3: 138},
'attribute b': {0: 19, 1: 13, 2: 134, 3: 101},
'attribute c': {0: 39, 1: 118, 2: 82, 3: 29}}
df=pd.DataFrame(dfdict)
country state   city    attribut a  attribute b attribute c
0   USA California  San Francisco   87  19  39
1   USA California  Los Angeles 57  13  118
2   USA Texas   Dallas  1   134 82
3   USA Texas   Houston 138 101 29

预期输出:

defdict={"USA":{"California":{"San Francisco":{"atribute a":87,
"attribute b":19,
"attribute c":39},
"Los Angeles":{"atribute a":57,
"attribute b":13,
"attribute c":118}},
"Texas":{"Dallas":{"Dallas":{"atribute a":1,
"attribute b":134,
"attribute c":82},
"Houston":{"atribute a":138,
"attribute b":101,
"attribute c":29}
}
}
}
}

不幸的是,我所做的每次尝试都返回一个错误。即使从最简单的开始,如:

dictp=df[["country","state"]].apply(lambda x: {a:b for a,b in x}, axis=1)

正确的方法是什么?

这是一个三层嵌套的for循环,它可以做你想做的事情,至少可以成为进一步优化的起点。我将.tolist()放入最内层循环中,以防该城市有多个条目。

outs={}
for i, c in df.groupby('country'):
if outs.get(i) is None:
outs[i] = {}
for j, s in c.groupby('state'):
if outs[i].get(j) is None:
outs[i][j] = {}
for k, city in s.groupby('city'):
outs[i][j][k] = {
col: city[col].tolist() for col in city.columns
}

假设数据中没有任何重复(因为预期的输出只包含字典)。使用groupbyiterrows,您可以得到所需的输出如下:

import pandas as pd
#loading df
dfdict={'country': {0: 'USA', 1: 'USA', 2: 'USA', 3: 'USA'},
'state': {0: 'California', 1: 'California', 2: 'Texas', 3: 'Texas'},
'city': {0: 'San Francisco', 1: 'Los Angeles', 2: 'Dallas', 3: 'Houston'},
'attribute a': {0: 87, 1: 57, 2: 1, 3: 138},
'attribute b': {0: 19, 1: 13, 2: 134, 3: 101},
'attribute c': {0: 39, 1: 118, 2: 82, 3: 29}}
df=pd.DataFrame(dfdict)
#See below code
# created group based on country>state>city and then used sum() to obtain  
# dataframe groupings 
groups = df.groupby(['country','state','city']).sum()
d = {}
#simple iteration over data
for r,grp in groups.iterrows():
key = grp.name
country = key[0]
state = key[1]
city = key[2]
attribute_a = grp['attribute a']
attribute_b = grp['attribute b']
attribute_c = grp['attribute c']

if country not in d:
d[country] = {}
if state not in d[country]:
d[country][state] = {}
if city not in d[country][state]:
d[country][state][city] = {
'attribute a': attribute_a,
'attribute b': attribute_b,
'attribute c': attribute_c
}

输出:

d =
{
"USA":{
"California":{
"Los Angeles":{
"attribute a":57,
"attribute b":13,
"attribute c":118
},
"San Francisco":{
"attribute a":87,
"attribute b":19,
"attribute c":39
}
},
"Texas":{
"Dallas":{
"attribute a":1,
"attribute b":134,
"attribute c":82
},
"Houston":{
"attribute a":138,
"attribute b":101,
"attribute c":29
}
}
}
}

相关内容

  • 没有找到相关文章

最新更新