假设我有一个这样的数据集:
您可以通过创建2个groupby数据帧来实现这一点,每个数据帧一个,并将它们合并在一起。
部门统计:
dept = df.groupby('Department', as_index=False).count()[['Department', 'ID']]
dept = dept.rename(columns = {'ID':'Department_Count'})
Department ID
0 Design 2
1 HR 2
2 IT 4
水平数:
level = df.groupby(['Department', 'Level'], as_index=False).count()
level = level.rename(columns = {'ID':'Level_Count'})
Department Level Level_Count
0 Design middle 2
1 HR middle 1
2 HR senior 1
3 IT junior 1
4 IT middle 2
5 IT senior 1
然后将两者合并到Department
df_out = dept.merge(level, on='Department')
Department ID Level Level_Count
0 Design 2 middle 2
1 HR 2 middle 1
2 HR 2 senior 1
3 IT 4 junior 1
4 IT 4 middle 2
5 IT 4 senior 1
要按要求获得Department和ID列中的Nan,您可以使用.loc查找这些列中的重复行,并替换为Nan(将需要import numpy as np
):
df_out.loc[df_out[['Department', 'ID']].duplicated(), ['Department', 'ID']] = np.nan
Department ID Level Level_Count
0 Design 2.0 middle 2
1 HR 2.0 middle 1
2 NaN NaN senior 1
3 IT 4.0 junior 1
4 NaN NaN middle 2
5 NaN NaN senior 1