我正在尝试计算数据帧中多列的加权平均值。这是我的数据样本
组 | 年 | 月 | 体重(kg(氮钙|||
---|---|---|---|---|---|
A | 2020 | 01 | 10000 | 10 | >70 |
A | 2020 | 01 | <15000>4 | >78 | |
2021 | 05 | 12000 | 5 | 66||
2021 | 05 | 10000 | 8 | 54||
B | 2021 | 0814000 | 10 | >td>90||
C | 2021 | 0850000 | >20 | 92 | |
C | 2021 | 0840000 | 10 | 95 |
用于多列工作和避免删除列进行分组的更改函数转换为MultiIndex
:
def wavg(x, value, weight):
d = x[value]
w = x[weight]
try:
return (d.mul(w, axis=0)).div(w.sum())
except ZeroDivisionError:
return d.mean()
#columns used for groupby
groups = ["Group", "Year", "Month"]
#processing all another columns
cols = df.columns.difference(groups + ["Weight(kg)"], sort=False)
#create index and processing all columns by variable cols
df1 = (df.set_index(groups)
.groupby(level=groups)
.apply(wavg, cols, "Weight(kg)")
.reset_index())
print (df2)
Group Year Month Calcium Nitrogen
0 A 2020 1 28.000000 4.000000
1 A 2020 1 46.800000 2.400000
2 A 2021 5 36.000000 2.727273
3 A 2021 5 24.545455 3.636364
4 B 2021 8 90.000000 10.000000
5 C 2021 8 51.111111 11.111111
6 C 2021 8 42.222222 4.444444
通过concat()
和reset_index()
:尝试
df=pd.concat(column_list,axis=1).reset_index()
或
您可以在此处进行更改:
column_list=[]
for column in df.columns:
column_list.append(df.groupby(["Group", "year", "month"]).apply(wavg, column, "Weight(kg").reset_index())
#Finally:
df=pd.concat(column_list,axis=1)