将 csv 数据帧调整为字典样式



我有一个来自 API 调用的名为tshirt_orders的熊猫数据帧,如下所示:

Alice, small, red  
Alice, small, green  
Bob, small, blue  
Bob, small, orange  
Cesar, medium, yellow  
David, large, purple  

我怎样才能把它变成字典样式的格式,我首先按大小,并在名称下有子键,在颜色下有另一个子列表,以便在使用tshirt_orders迭代时可以解决它?

喜欢这个:

size:
small:
Name:
Alice:
Color:
red
green
Bob:
Color:
blue
orange
medium:
Name:
Cesar:
Color:
yellow
large:
Name:
David:
Color:
purple

改变这种情况的最佳解决方案是什么?它位于熊猫数据帧中,但如果有更好的解决方案,更改它不是问题。

关闭是将数据帧写入yaml

首先在字典理解中创建嵌套词典:

print (df)
A       B       C
0  Alice   small     red
1  Alice   small   green
2    Bob   small    blue
3    Bob   small  orange
4  Cesar  medium  yellow
5  David   large  purple
d = {k:v.groupby('A', sort=False)['C'].apply(list).to_dict() 
for k, v in df.groupby('B', sort=False)}
print (d)
{'small': {'Alice': ['red', 'green'], 
'Bob': ['blue', 'orange']}, 
'medium': {'Cesar': ['yellow']}, 
'large': {'David': ['purple']}}

size添加到 dict for key 中,然后写入yaml文件:

import yaml
with open('result.yml', 'w') as yaml_file:
yaml.dump({'size': d}, yaml_file, default_flow_style=False, sort_keys=False)

size:
small:
Alice:
- red
- green
Bob:
- blue
- orange
medium:
Cesar:
- yellow
large:
David:
- purple

或者创建 json:

import json
with open("result.json", "w") as twitter_data_file:
json.dump({'size': d}, twitter_data_file, indent=4)
{
"size": {
"small": {
"Alice": [
"red",
"green"
],
"Bob": [
"blue",
"orange"
]
},
"medium": {
"Cesar": [
"yellow"
]
},
"large": {
"David": [
"purple"
]
}
}
}

编辑:

df = df.assign(A1='Name', B1='size', C1='Color')
df1 = df.groupby(['B1','B','A1','A','C1'], sort=False)['C'].apply(list).reset_index()
#https://stackoverflow.com/a/19900276
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0], sort=False)
d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
return d
d = recur_dictify(df1)
print (d)
{'size': {'small': {'Name': {'Alice': {'Color': ['red', 'green']}, 
'Bob': {'Color': ['blue', 'orange']}}}, 
'medium': {'Name': {'Cesar': {'Color': ['yellow']}}}, 
'large': {'Name': {'David': {'Color': ['purple']}}}}}
import yaml
with open('result.yml', 'w') as yaml_file:
yaml.dump(d, yaml_file, default_flow_style=False, sort_keys=False)

最新更新