将多索引分组表转换为格式化的描述性表(python/panda)



我有一个df,看起来有点像:

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3],
'eth': ['H','H','H','H','H','H','H','H', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'B', 'B', 'B', 'B'],
'sex': ['F','F','F','F','F','F','F','F','M','M','M','M','M','M','M','M','F','F','F','F'],
'outcome': ["eating", "drinking", "eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking"],
'mean': [3.4,4.5,5.6,7.8,9.8,3.4,6.5,7.8,9.5,1.2,3.4,5.6,7.6,6.5,4.3,12.3,3.2,4.3,5.6,7.8]})
df.head()
id  eth sex outcome   mean
0   1   H   F   eating    3.4
1   1   H   F   drinking  4.5
2   1   H   F   eating    5.6
3   1   H   F   drinking  7.8
4   1   H   F   eating    9.8

然后,我根据结果和种族获得平均值栏的平均值和标准差:

ethnicity = df.groupby(['outcome','eth']).agg({'mean':['mean','std']})
ethnicity
mean          
mean       std
outcome  eth                 
drinking B    6.050  2.474874
H    5.875  2.267708
W    6.400  4.564355
eating   B    4.400  1.697056
H    6.325  2.657536
W    6.200  2.846050

我想把这个多索引分组表变成这样的表:

desired = pd.DataFrame({'eth': ['B', 'H', 'W'],
'drinking': ["6.05 (2.47)", "5.88 (2.27)", "6.40 (4.56)"],
'eating': ["4.40 (1.70)", "6.33 (2.66)", "6.20 (2.85)"]})
desired
eth     drinking       eating
0   B  6.05 (2.47)  4.40 (1.70)
1   H  5.88 (2.27)  6.33 (2.66)
2   W  6.40 (4.56)  6.20 (2.85)

我的问题是如何从具有多索引的分组表转到所需的表?

提前谢谢。

您可以在获得groupby后进行格式更改,然后进行unstack

out = df.groupby(['outcome','eth'])['mean'].agg(['mean','std'])
out = out.astype(str).agg(lambda x : x['mean'] + '(' + x['std']+')',axis=1).unstack(level=0).reset_index()
Out[597]: 
outcome eth                  drinking                     eating
0         B  6.05(2.4748737341529163)    4.4(1.6970562748477134)
1         H  5.875(2.267708094089713)  6.325(2.6575364531836625)
2         W    6.4(4.564354645876385)    6.2(2.8460498941515415)

一种可能的解决方案:

(ethnicity['mean']
.round(2).astype(str)
.apply(' ('.join, axis=1).add(')')
.unstack('outcome')
)

输出:

outcome     drinking       eating
eth                              
B        6.05 (2.47)    4.4 (1.7)
H        5.88 (2.27)  6.32 (2.66)
W         6.4 (4.56)   6.2 (2.85)

最新更新