Value_counts()用于一个数据框中的每个列



假设我有一个这样的数据集:

您可以通过创建2个groupby数据帧来实现这一点,每个数据帧一个,并将它们合并在一起。

部门统计:

dept = df.groupby('Department', as_index=False).count()[['Department', 'ID']]
dept = dept.rename(columns = {'ID':'Department_Count'})
Department  ID
0     Design   2
1         HR   2
2         IT   4

水平数:

level = df.groupby(['Department', 'Level'], as_index=False).count()
level = level.rename(columns = {'ID':'Level_Count'})
Department   Level  Level_Count
0     Design  middle            2
1         HR  middle            1
2         HR  senior            1
3         IT  junior            1
4         IT  middle            2
5         IT  senior            1

然后将两者合并到Department

df_out = dept.merge(level, on='Department')
Department  ID   Level  Level_Count
0     Design   2  middle            2
1         HR   2  middle            1
2         HR   2  senior            1
3         IT   4  junior            1
4         IT   4  middle            2
5         IT   4  senior            1

要按要求获得Department和ID列中的Nan,您可以使用.loc查找这些列中的重复行,并替换为Nan(将需要import numpy as np):

df_out.loc[df_out[['Department', 'ID']].duplicated(), ['Department', 'ID']] = np.nan
Department   ID   Level  Level_Count
0     Design  2.0  middle            2
1         HR  2.0  middle            1
2        NaN  NaN  senior            1
3         IT  4.0  junior            1
4        NaN  NaN  middle            2
5        NaN  NaN  senior            1

相关内容

最新更新