我有一个使用python和numpy的数据集,我有一个数据集,看员工和他们的经理。我想要的是计算一个经理有多少份报告。
下面是样本数据集:
ID Date Job Dept. Manager ID
1 Oct 2022 Sales Rep Sales 5
1 Dec 2022 Sales Rep Sales 5
1 Feb 2023 Sales Rep Sales 5
2 Feb 2022 Tech Support Tech 4
2 Jun 2022 Sales Advisor Sales 5
2 Nov 2022 Sales Advisor Sales 5
3 Dec 2021 Tech Consult Tech 4
3 Sept 2022 Tech Advisor Tech 4
我想要的输出是:
Manager ID Reports
4 1
5 2
我目前使用的代码是:
counts = df['ID'].groupby(df['Manager ID'].astype(float)).nunique()
df['Reports'] = df['ID'].astype(str).map(counts).fillna(0, downcast ='infer')
这段代码只输出零。对此有什么建议吗?
您可以尝试:
>>> (df.drop_duplicates(['ID', 'Manager ID'])
.groupby('Manager ID', as_index=False)
.agg(Report=('ID', 'size')))
Manager ID Report
0 4 2
1 5 2
你只需要:
df.groupby('Manager ID')['ID'].nunique()
输出:
Manager ID
4 2
5 2
另一个可能的解决方案:
managers = df['Manager ID'].unique()
pd.DataFrame({
'Manager ID': managers,
'Reports': [df.loc[df['Manager ID'].eq(x), 'ID'].nunique() for x in managers]
})
输出:
Manager ID Reports
0 5 2
1 4 2
似乎你可以这样做
import collections
print(collections.Counter(df['Manager ID']))