Pandas-查找过去3天的滚动平均值



我试图找到一组特定元素过去3天的平均数据,如下所示。下面是我的数据帧的样子:

day, category, sub-category, count
2021-01-01, electronic, phone, 10
2021-01-02, electronic, phone, 12
2021-01-03, electronic, phone, 3
2021-01-04, electronic, phone, 20
2021-01-01, electronic, computer, 15
2021-01-02, electronic, computer, 12
2021-01-03, electronic, computer, 20

预期输出

day, category, sub-category, count, rolling_average_past_3_days
2021-01-01, electronic, phone, 10,
2021-01-02, electronic, phone, 12,
2021-01-03, electronic, phone, 3, 8.3
2021-01-04, electronic, phone, 20, 11.6
2021-01-01, electronic, computer, 15,
2021-01-02, electronic, computer, 12,
2021-01-03, electronic, computer, 20, 15.7

为了通用,您可以在"电子"旁边有多个主类别,您还需要按类别分组。您还需要使用.transform(),以便将结果广播回原始结构,以便可以获取所有条目。否则,您可能会出现以下错误:TypeError: incompatible index of inserted column with frame index

你可以试试这个:

df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean())
print(df)
day    category sub-category  count  rolling_average_past_3_days
0  2021-01-01  electronic        phone     10                          NaN
1  2021-01-02  electronic        phone     12                          NaN
2  2021-01-03  electronic        phone      3                     8.333333
3  2021-01-04  electronic        phone     20                    11.666667
4  2021-01-01  electronic     computer     15                          NaN
5  2021-01-02  electronic     computer     12                          NaN
6  2021-01-03  electronic     computer     20                    15.666667

每个子类别的前2个条目上显示的NaN值是正常的,因为您在3天内滚动平均,而前2天没有值。如果你想让它们空白而不是显示NaN,你可以使用这个:

df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean()).replace(np.nan, '')

print(df)
day    category sub-category  count rolling_average_past_3_days
0  2021-01-01  electronic        phone     10                            
1  2021-01-02  electronic        phone     12                            
2  2021-01-03  electronic        phone      3                    8.333333
3  2021-01-04  electronic        phone     20                   11.666667
4  2021-01-01  electronic     computer     15                            
5  2021-01-02  electronic     computer     12                            
6  2021-01-03  electronic     computer     20                   15.666667

试试这个:

df['rolling_average_past_3_days'] = df.groupby('sub-category')['count'].rolling(3).mean().droplevel(0)

df.groupby('sub-category',sort=False).rolling('3D',on='day')['count'].mean().to_numpy()

最新更新