我有一个df =
statistics s_values
year
1999 cigarette use 100
1999 cellphone use 310
1999 internet use 101
1999 alcohol use 100
1999 soda use 215
2000 cigarette use 315
2000 cellphone use 317
2000 internet use 325
2000 alcohol use 108
2000 soda use 200
2001 cigarette use 122
2001 cellphone use 311
2001 internet use 112
2001 alcohol use 144
2001 soda use 689
我根据year
index
和statistics
column
计算了最大值、最小值和平均值。
我想在数据帧中插入平均值、最大值和最小值作为列,其中输出结果看起来像
我想要的输出:
statistics s_values mean min max
year
1999 alcohol use 100.0 104.0 100.0 108.0
1999 cellphone use 310.0 313.5 310.0 317.0
1999 cigarette use 100.0 207.5 100.0 315.0
1999 internet use 101.0 213.0 101.0 325.0
1999 soda use 215.0 207.5 200.0 215.0
2000 alcohol use 108.0 104.0 100.0 108.0
2000 cellphone use 317.0 313.5 310.0 317.0
2000 cigarette use 315.0 207.5 100.0 315.0
2000 internet use 325.0 213.0 101.0 325.0
2000 soda use 200.0 207.5 200.0 215.0
2001 alcohol use 144.0 104.0 100.0 108.0
2001 cellphone use 311.0 313.5 310.0 317.0
2001 cigarette use 122.0 207.5 100.0 315.0
2001 internet use 112.0 213.0 101.0 325.0
2001 soda use 689.0 207.5 200.0 215.0
我试着做了以下操作,但列中的值都是NaN
gen_mean = df.groupby('statistics').mean()
gen_min = df.groupby('statistics').min()
gen_max = df.groupby('statistics').max()
df.insert(2, 'Gen Avg', gen_mean)
df.insert(3, 'Gen Max', gen_max)
df.insert(4, 'Gen Min', gen_min)
谢谢
试用groupby aggregate
+join
:
df = df.join(
df.groupby('statistics')['s_values'].aggregate(['mean', 'min', 'max']),
on='statistics'
)
df
:
statistics s_values mean min max
year
1999 cigarette use 100 179.000000 100 315
1999 cellphone use 310 312.666667 310 317
1999 internet use 101 179.333333 101 325
1999 alcohol use 100 117.333333 100 144
1999 soda use 215 368.000000 200 689
2000 cigarette use 315 179.000000 100 315
2000 cellphone use 317 312.666667 310 317
2000 internet use 325 179.333333 101 325
2000 alcohol use 108 117.333333 100 144
2000 soda use 200 368.000000 200 689
2001 cigarette use 122 179.000000 100 315
2001 cellphone use 311 312.666667 310 317
2001 internet use 112 179.333333 101 325
2001 alcohol use 144 117.333333 100 144
2001 soda use 689 368.000000 200 689
使用的帧:
df = pd.DataFrame({
'year': [1999, 1999, 1999, 1999, 1999, 2000, 2000, 2000, 2000, 2000, 2001,
2001, 2001, 2001, 2001],
'statistics': ['cigarette use', 'cellphone use', 'internet use',
'alcohol use', 'soda use',
'cigarette use', 'cellphone use', 'internet use',
'alcohol use', 'soda use',
'cigarette use', 'cellphone use', 'internet use',
'alcohol use', 'soda use'],
's_values': [100, 310, 101, 100, 215, 315, 317, 325, 108, 200, 122, 311,
112, 144, 689]
}).set_index('year')
groupby(...).mean()
将返回一个数据帧,其中的行对应于组。您需要transform
:
df['mean'] = df.groupby('statistics')['s_values'].transform('mean')
# I hope you get the idea how to get min/max