我有一个数据帧df,我想在其中按ID分组,并显示给定时间段的平均值以及标准偏差(1sd、2sd、3sd(列表
id start end diff percent date
a 4/1/2019 5/1/2019 160 11 04-01-2019 to 05-01-2019
a 5/1/2019 6/1/2019 136 8 05-01-2019 to 06-01-2019
a 6/1/2019 7/1/2019 174 9 06-01-2019 to 07-01-2019
b 4/1/2019 5/1/2019 112 5 04-01-2019 to 05-01-2019
b 5/1/2019 6/1/2019 141 6 05-01-2019 to 06-01-2019
所需输出
id mean st.deviation sd1 sd2 sd3
a 156.6 19.2 137.4 118.2 99
b 126.5 20.5 106 86.4 65
Standard deviation sd1 = mean - 1(19.2)
sd2 = mean - 2(19.2)
sd3 = mean - 3(19.2)
这就是我正在做的:
import pandas
import numpy
import math
import sys
def sd_calc(df):
n = len(df)
if n <= 1:
return 0.0
mean, sd = avg_calc(df), 0.0
# calculate stan. dev.
for el in df:
sd += (float(el) - mean)**2
sd = math.sqrt(sd / float(n-1))
return sd
我能够计算标准偏差,但不确定如何将其纳入Panda数据帧。
让我们试用std
和groupby
out = df.groupby('id')['diff'].agg(['mean','std'])
Out[44]:
mean std
id
a 156.666667 19.218047
b 126.500000 20.506097
对于std1到std3,您可以通过创建
out['std1'] = out['mean'] - out['std']