df多列上的加权平均聚合



我正在尝试计算数据帧中多列的加权平均值。这是我的数据样本

年体重(kg(氮钙><15000>>665408>td>9008>08
A 2020 01 10000 1070
A 2020 01478
2021 05 12000 5
2021 05 10000 8
B 20211400010
C 2021500002092
C 2021400001095

用于多列工作和避免删除列进行分组的更改函数转换为MultiIndex:

def wavg(x, value, weight):
d = x[value]
w = x[weight]
try:
return (d.mul(w, axis=0)).div(w.sum())
except ZeroDivisionError:
return d.mean()
#columns used for groupby
groups = ["Group", "Year", "Month"]
#processing all another columns
cols = df.columns.difference(groups + ["Weight(kg)"], sort=False)
#create index and processing all columns by variable cols
df1 = (df.set_index(groups)
.groupby(level=groups)
.apply(wavg, cols, "Weight(kg)")
.reset_index())
print (df2)
Group  Year  Month    Calcium   Nitrogen
0     A  2020      1  28.000000   4.000000
1     A  2020      1  46.800000   2.400000
2     A  2021      5  36.000000   2.727273
3     A  2021      5  24.545455   3.636364
4     B  2021      8  90.000000  10.000000
5     C  2021      8  51.111111  11.111111
6     C  2021      8  42.222222   4.444444

通过concat()reset_index():尝试

df=pd.concat(column_list,axis=1).reset_index()

您可以在此处进行更改:

column_list=[]
for column in df.columns:
column_list.append(df.groupby(["Group", "year", "month"]).apply(wavg, column, "Weight(kg").reset_index())
#Finally:
df=pd.concat(column_list,axis=1)

最新更新