转换表以在Python中显示带有平均值的标准偏差



我有一个数据帧df,我想在其中按ID分组,并显示给定时间段的平均值以及标准偏差(1sd、2sd、3sd(列表

id  start       end         diff  percent   date
a   4/1/2019    5/1/2019    160   11        04-01-2019 to 05-01-2019
a   5/1/2019    6/1/2019    136   8         05-01-2019 to 06-01-2019
a   6/1/2019    7/1/2019    174   9         06-01-2019 to 07-01-2019
b   4/1/2019    5/1/2019    112   5         04-01-2019 to 05-01-2019
b   5/1/2019    6/1/2019    141   6         05-01-2019 to 06-01-2019

所需输出

id  mean       st.deviation   sd1          sd2      sd3
a   156.6      19.2           137.4        118.2    99                        
b   126.5      20.5           106          86.4     65


Standard deviation sd1 = mean - 1(19.2) 
sd2 = mean - 2(19.2)
sd3 = mean - 3(19.2)

这就是我正在做的:

import pandas
import numpy
import math
import sys
def sd_calc(df):
n = len(df)
if n <= 1:
return 0.0
mean, sd = avg_calc(df), 0.0
# calculate stan. dev.
for el in df:
sd += (float(el) - mean)**2
sd = math.sqrt(sd / float(n-1))
return sd   

我能够计算标准偏差,但不确定如何将其纳入Panda数据帧。

让我们试用stdgroupby

out = df.groupby('id')['diff'].agg(['mean','std'])
Out[44]: 
mean        std
id                       
a   156.666667  19.218047
b   126.500000  20.506097

对于std1到std3,您可以通过创建

out['std1'] = out['mean'] - out['std']

最新更新