我有一个df,看起来有点像:
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3],
'eth': ['H','H','H','H','H','H','H','H', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'B', 'B', 'B', 'B'],
'sex': ['F','F','F','F','F','F','F','F','M','M','M','M','M','M','M','M','F','F','F','F'],
'outcome': ["eating", "drinking", "eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking","eating", "drinking"],
'mean': [3.4,4.5,5.6,7.8,9.8,3.4,6.5,7.8,9.5,1.2,3.4,5.6,7.6,6.5,4.3,12.3,3.2,4.3,5.6,7.8]})
df.head()
id eth sex outcome mean
0 1 H F eating 3.4
1 1 H F drinking 4.5
2 1 H F eating 5.6
3 1 H F drinking 7.8
4 1 H F eating 9.8
然后,我根据结果和种族获得平均值栏的平均值和标准差:
ethnicity = df.groupby(['outcome','eth']).agg({'mean':['mean','std']})
ethnicity
mean
mean std
outcome eth
drinking B 6.050 2.474874
H 5.875 2.267708
W 6.400 4.564355
eating B 4.400 1.697056
H 6.325 2.657536
W 6.200 2.846050
我想把这个多索引分组表变成这样的表:
desired = pd.DataFrame({'eth': ['B', 'H', 'W'],
'drinking': ["6.05 (2.47)", "5.88 (2.27)", "6.40 (4.56)"],
'eating': ["4.40 (1.70)", "6.33 (2.66)", "6.20 (2.85)"]})
desired
eth drinking eating
0 B 6.05 (2.47) 4.40 (1.70)
1 H 5.88 (2.27) 6.33 (2.66)
2 W 6.40 (4.56) 6.20 (2.85)
我的问题是如何从具有多索引的分组表转到所需的表?
提前谢谢。
您可以在获得groupby
后进行格式更改,然后进行unstack
out = df.groupby(['outcome','eth'])['mean'].agg(['mean','std'])
out = out.astype(str).agg(lambda x : x['mean'] + '(' + x['std']+')',axis=1).unstack(level=0).reset_index()
Out[597]:
outcome eth drinking eating
0 B 6.05(2.4748737341529163) 4.4(1.6970562748477134)
1 H 5.875(2.267708094089713) 6.325(2.6575364531836625)
2 W 6.4(4.564354645876385) 6.2(2.8460498941515415)
一种可能的解决方案:
(ethnicity['mean']
.round(2).astype(str)
.apply(' ('.join, axis=1).add(')')
.unstack('outcome')
)
输出:
outcome drinking eating
eth
B 6.05 (2.47) 4.4 (1.7)
H 5.88 (2.27) 6.32 (2.66)
W 6.4 (4.56) 6.2 (2.85)