按列/年移动平均数-蟒蛇、熊猫



我需要在"total_medals"列上按国家[noc]建立前几年的移动平均值-我的数据看起来像:

medal     Bronze  Gold  Medal  Silver  **total_medals**
noc year                                           
ALG 1984     2.0   NaN    NaN     NaN           2.0
1992     4.0   2.0    NaN     NaN           6.0
1996     2.0   1.0            4.0           7.0
ANZ 1984     2.0  15.0    NaN     2.0          19.0
1992     3.0   5.0    NaN     2.0          10.0
1996     1.0   2.0            2.0           5.0
ARG 1984     2.0   6.0    NaN     3.0          11.0
1992     5.0   3.0    NaN    24.0          32.0
1992     3.0   7.0    NaN     5.0          15.0

我想有一个每个国家和年份的移动平均数(即ALG:1984 Avg(total_medals(=2.0;1992年平均值(total_medals(=(2.0+6.0(/2=4.0;1996 Acg(total_medals(=(2.0+6.0+7.0(/3=5.0(-移动平均线应出现在新列中(total_medals旁边(。

此外,对于每个国家&年组合称为"性能"的新列应该是"total_medals"除以"移动平均值"的分数

示例数据帧

print(df)
medal  Bronze  Gold  Medal  Silver 
noc year                                     
ALG 1984    2.0     NaN   NaN    NaN     2.0 
1992    4.0     2.0   NaN    NaN     6.0 
1996    2.0     1.0   NaN    4.0     7.0 
ANZ 1984    2.0    15.0   NaN    2.0    19.0 
1992    3.0     5.0   NaN    2.0    10.0 
1996    1.0     2.0   NaN    2.0     5.0 
ARG 1984    2.0     6.0   NaN    3.0    11.0 
1992    5.0     3.0   NaN   24.0    32.0 
1992    3.0     7.0   NaN    5.0    15.0 

使用DataFrame.groupby+expanding:

df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)
medal  Bronze  Gold  Medal  Silver  total_medals 
noc year                                                 
ALG 1984    2.0     NaN   NaN    NaN     2.0    2.000000 
1992    4.0     2.0   NaN    NaN     6.0    4.000000 
1996    2.0     1.0   NaN    4.0     7.0    5.000000 
ANZ 1984    2.0    15.0   NaN    2.0    19.0   19.000000 
1992    3.0     5.0   NaN    2.0    10.0   14.500000 
1996    1.0     2.0   NaN    2.0     5.0   11.333333 
ARG 1984    2.0     6.0   NaN    3.0    11.0   11.000000 
1992    5.0     3.0   NaN   24.0    32.0   21.500000 
1992    3.0     7.0   NaN    5.0    15.0   19.333333 

bonze滞后

s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s

您可以为此创建一个函数

def lagged_medals(type_of_medal):
s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
s.index=s.index.droplevel()
df[f'{type_of_medal}_lagged']=s
lagged_medals('Silver')
#print(df)

最新更新