我试图找到一组特定元素过去3天的平均数据,如下所示。下面是我的数据帧的样子:
day, category, sub-category, count
2021-01-01, electronic, phone, 10
2021-01-02, electronic, phone, 12
2021-01-03, electronic, phone, 3
2021-01-04, electronic, phone, 20
2021-01-01, electronic, computer, 15
2021-01-02, electronic, computer, 12
2021-01-03, electronic, computer, 20
预期输出
day, category, sub-category, count, rolling_average_past_3_days
2021-01-01, electronic, phone, 10,
2021-01-02, electronic, phone, 12,
2021-01-03, electronic, phone, 3, 8.3
2021-01-04, electronic, phone, 20, 11.6
2021-01-01, electronic, computer, 15,
2021-01-02, electronic, computer, 12,
2021-01-03, electronic, computer, 20, 15.7
为了通用,您可以在"电子"旁边有多个主类别,您还需要按类别分组。您还需要使用.transform()
,以便将结果广播回原始结构,以便可以获取所有条目。否则,您可能会出现以下错误:TypeError: incompatible index of inserted column with frame index
你可以试试这个:
df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean())
print(df)
day category sub-category count rolling_average_past_3_days
0 2021-01-01 electronic phone 10 NaN
1 2021-01-02 electronic phone 12 NaN
2 2021-01-03 electronic phone 3 8.333333
3 2021-01-04 electronic phone 20 11.666667
4 2021-01-01 electronic computer 15 NaN
5 2021-01-02 electronic computer 12 NaN
6 2021-01-03 electronic computer 20 15.666667
每个子类别的前2个条目上显示的NaN
值是正常的,因为您在3天内滚动平均,而前2天没有值。如果你想让它们空白而不是显示NaN
,你可以使用这个:
df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean()).replace(np.nan, '')
print(df)
day category sub-category count rolling_average_past_3_days
0 2021-01-01 electronic phone 10
1 2021-01-02 electronic phone 12
2 2021-01-03 electronic phone 3 8.333333
3 2021-01-04 electronic phone 20 11.666667
4 2021-01-01 electronic computer 15
5 2021-01-02 electronic computer 12
6 2021-01-03 electronic computer 20 15.666667
试试这个:
df['rolling_average_past_3_days'] = df.groupby('sub-category')['count'].rolling(3).mean().droplevel(0)
或
df.groupby('sub-category',sort=False).rolling('3D',on='day')['count'].mean().to_numpy()