在条件下按列分组,计算平均值



我有一个raw数据集,如下所示:

<表类> 可乐 ColB 时间间隔计数器tbody><<tr>SD241SD332UD2110BUD122BUD222BSD3313BSD1419

通过复制ColB来使用DataFrame.pivot_table和辅助列new,然后平放MultiIndex并将输出添加到聚合sum创建的新DataFrame:

df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'], 
columns='new', 
values=['interval','duration'], 
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB  SumCounter  durationSD  durationUD  intervalSD  intervalUD
0    A   SD           3         2.5         0.0         3.5           0
1    A   UD          10         0.0         2.0         0.0           1
2    B   SD          32         2.0         0.0         3.5           0
3    B   UD           4         0.0         1.5         0.0           2

您可以尝试按A列分组和按B列分组,Named Aggregation

out = df.groupby('ColA').apply(lambda g: g.groupby('ColB').agg({'duration': [(f'{g["ColB"].iloc[0]}', 'mean')],
'interval': [(f'{g["ColB"].iloc[0]}', 'mean')],
'Counter': 'sum'})).fillna(0)
print(out)
duration interval Counter duration interval
SD       SD     sum       UD       UD
ColA ColB
A    SD        2.5      3.5       3      0.0      0.0
UD        2.0      1.0      10      0.0      0.0
B    SD        0.0      0.0      32      2.0      3.5
UD        0.0      0.0       4      1.5      2.0

然后重命名多索引列

out.columns = ['SumCounter' if 'Counter' in col[0] else f'Avg{col[0]}{col[1]}' for col in out.columns.values]
print(out)
AvgdurationSD  AvgintervalSD  SumCounter  AvgdurationUD  AvgintervalUD
ColA ColB
A    SD              2.5            3.5           3            0.0            0.0
UD              2.0            1.0          10            0.0            0.0
B    SD              0.0            0.0          32            2.0            3.5
UD              0.0            0.0           4            1.5            2.0

groupby:


temp = (df
.assign(dummy = df.ColB)
.groupby(['ColA','ColB','dummy'])
.agg({'duration':'mean', 'interval':'mean', 'Counter':'sum'})
.rename(columns = {'Counter':'SumCounter'})
.set_index('SumCounter', append = True)
.unstack('dummy', fill_value = 0)
)
temp.columns = temp.columns.map(lambda x: f"Avg{''.join(x)}")
temp.reset_index()
ColA ColB  SumCounter  AvgdurationSD  AvgdurationUD  AvgintervalSD  AvgintervalUD
0    A   SD           3            2.5            0.0            3.5            0.0
1    A   UD          10            0.0            2.0            0.0            1.0
2    B   SD          32            2.0            0.0            3.5            0.0
3    B   UD           4            0.0            1.5            0.0            2.0

最新更新