Pandas多索引/标头数据帧到嵌套字典



我想知道我是否可以将多索引和多头DataFrame转换为嵌套dict。

我有以下数据帧:

revenue                           taxes
sum       max      min         mean       sum
company              type
-                    -                                        15.00     15.00    15.00         0.00      0.00
Facebook             -                                      1259.79    218.50  -120.00         5.17    321.00
City TOT                                  0.00      0.00     0.00         4.00      4.00
Country TOT                               0.00      0.00     0.00         4.00      4.00
Sales                                     0.00      0.00     0.00         9.25     18.50
Google               %tax_1                                    0.00      0.00     0.00         0.89      3.58
-                                      3738.36   2643.08  -100.00        96.23  26369.57
City TOT                                  0.00      0.00     0.00         3.55     95.99
Country TOT                               0.00      0.00     0.00        23.25    628.00
...
df = pd.DataFrame.from_dict({('-', '-'): {('revenue', 'sum'): 15.0,
('revenue', 'max'): 15.0,
('revenue', 'min'): 15.0,
('taxes', 'mean'): 0.00,
('taxes', 'sum'): 0.0},
('Facebook', '-'): {('revenue', 'sum'): 1259.79,
('revenue', 'max'): 218.5,
('revenue', 'min'): -120.0,
('taxes', 'mean'): 5.17,
('taxes', 'sum'): 321.0},
('Facebook', 'City TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 4.00,
('taxes', 'sum'): 4.0},
('Facebook', 'Country TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 4.00,
('taxes', 'sum'): 4.0},
('Facebook', 'Sales'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 9.25,
('taxes', 'sum'): 18.5},
('Google', '%tax_1'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 0.89,
('taxes', 'sum'): 3.58},
('Google', '-'): {('revenue', 'sum'): 3738.36,
('revenue', 'max'): 2643.08,
('revenue', 'min'): -100.0,
('taxes', 'mean'): 96.23,
('taxes', 'sum'): 26369.57},
('Google', 'City TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 3.55,
('taxes', 'sum'): 95.99},
('Google', 'Country TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 23.25,
('taxes', 'sum'): 628.0},
('Google', 'PER GETS'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 0.88,
('taxes', 'sum'): 2.64},
('Google', 'Sales'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 12.61,
('taxes', 'sum'): 138.75},
('Google', 'VAT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 11.70,
('taxes', 'sum'): 1065.51},
('Google', 'per room_1'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 5.00,
('taxes', 'sum'): 20.0},
('Google', 'tax on top'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 8.68,
('taxes', 'sum'): 78.2},
('Google', 'tax per reserv'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 20.00,
('taxes', 'sum'): 40.0}}, orient='index')
df.index.names = ['company', 'type']

列和索引为MultiIndex:

MultiIndex([('revenue',  'sum'),
('revenue',  'max'),
('revenue',  'min'),
( 'taxes', 'mean'),
( 'taxes',  'sum')],)
MultiIndex([(                  '-',                                '-'),
(           'Facebook',                                '-'),
(           'Facebook',                         'City TOT'),
(           'Facebook',                      'Country TOT'),
(           'Facebook',                            'Sales'),
(           'Google',                           '%tax_1'),
...

我尝试过使用这种方法(基于Pandas多索引数据帧到嵌套字典(:

{level: df.xs(level).to_dict('index') for level in df.index.levels[0]}

然而,这就是我得到的:

{'-': {('revenue', 'sum'): {'-': 15.0},
('revenue', 'max'): {'-': 15.0},
('revenue', 'min'): {'-': 15.0},
('taxes', 'mean'): {'-': 0.0)},
('taxes', 'sum'): {'-': 0.0}},
'Facebook': {('revenue', 'sum'): {'-': 1259.79,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
('revenue', 'max'): {'-': 218.5,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
('revenue', 'min'): {'-': -120.0,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
...

相反,我希望实现的是在列上嵌套dict(与索引上的dict完全一样(,其中数据按照如下索引嵌套:

{
'-': {
'-':  {
'revenue': {
'sum': 15.0,
'max': 15.0,
'min': 15.0,
},
'taxes': {
'mean': 0.00,
'sum': 0.00,
}
},
},
'Facebook': {
'-': {
'revenue': {
'sum': 1259.79,
'max': 218.5,
'min': -120.0
}
'taxes': ...
},
'City TOT': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
'Country TOT': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
'Sales': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
} 
}

有什么办法让我绕过这个吗?谢谢

分两步:

  1. 透视数据帧并将其转换为平面dict
  2. 将元组键转换为嵌套dict

步骤1

data = df.stack(level=[0, 1]).to_dict()
print(data)
# Output
{('-', '-', 'revenue', 'max'): 15.0,
('-', '-', 'revenue', 'min'): 15.0,
('-', '-', 'revenue', 'sum'): 15.0,
...
('Google', 'Country TOT', 'revenue', 'sum'): 0.0,
('Google', 'Country TOT', 'taxes', 'mean'): 23.25,
('Google', 'Country TOT', 'taxes', 'sum'): 628.0}

步骤2

d = {}
for t, v in data.items():
e = d.setdefault(t[0], {})
for k in t[1:-1]:
e = e.setdefault(k, {})
e[t[-1]] = v
print(d)
# Output
{'-': {'-': {'revenue': {'max': 15.0, 'min': 15.0, 'sum': 15.0},
'taxes': {'mean': 0.0, 'sum': 0.0}}},
'Facebook': {'-': {'revenue': {'max': 218.5, 'min': -120.0, 'sum': 1259.79},
'taxes': {'mean': 5.17, 'sum': 321.0}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Sales': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 9.25, 'sum': 18.5}}},
'Google': {'%tax_1': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 0.89, 'sum': 3.58}},
'-': {'revenue': {'max': 2643.08, 'min': -100.0, 'sum': 3738.36},
'taxes': {'mean': 96.23, 'sum': 26369.57}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 3.55, 'sum': 95.99}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 23.25, 'sum': 628.0}}}}

漂亮的打印:

import json
print(json.dumps(d, indent=4))
# Output
{
"-": {
"-": {
"revenue": {
"max": 15.0,
"min": 15.0,
"sum": 15.0
},
"taxes": {
"mean": 0.0,
"sum": 0.0
}
}
},
"Facebook": {
"-": {
"revenue": {
"max": 218.5,
"min": -120.0,
"sum": 1259.79
},
"taxes": {
"mean": 5.17,
"sum": 321.0
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Sales": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 9.25,
"sum": 18.5
}
}
},
"Google": {
"%tax_1": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 0.89,
"sum": 3.58
}
},
"-": {
"revenue": {
"max": 2643.08,
"min": -100.0,
"sum": 3738.36
},
"taxes": {
"mean": 96.23,
"sum": 26369.57
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 3.55,
"sum": 95.99
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 23.25,
"sum": 628.0
}
}
}
}

最新更新