我有一个数据框架,包含以下两列:员工类型、姓名、标识主合同的列及其ID号。像这个:
Name Primary row? Employee Type ID
Paulo Cortez Yes Employee 100000
Paulo Cortez No Employee 100000
Joan San Yes Non-employee 100001
Felipe Castro Yes Contractor 100002
Felipe Castro No Employee 100002
Felipe Castro No Contractor 100002
我需要创建一个子ID列,该列采用ID值,并在前面添加员工类型的第一个数字(可能是员工、非员工和承包商(。如果ID出现不止一次;主行"柱如果它说";是";,只是以相同的格式离开;否";在其上添加"-2〃"-3〃;,如下所示:
Name Primary row? Employee Type ID sub ID
Paulo Cortez Yes Employee 100000 E100000
Paulo Cortez No Employee 100000 E100000-2
Joan San Yes Non-employee 100001 N100001
Felipe Castro Yes Contractor 100002 C100002
Felipe Castro No Employee 100002 E100002-2
Felipe Castro No Contractor 100002 E100002-3
实现这一结果的最佳方式是什么?
这里有一种方法。如果需要,首先创建一个后缀为cumcount的groupby。然后应用每一行,取所有部分加在一起。
df['sub_ID'] = df.groupby('ID').cumcount().add(1)
df['sub_ID'] = df.apply(lambda row:
row['Employee Type'][0]
+ str(row['ID'])
+ ("" if row['Primary row?']=="Yes" else "-"+str(row['sub_ID']))
,axis=1)
输出df
:
Name Primary row? Employee Type ID sub_ID
0 Paulo Cortez Yes Employee 100000 E100000
1 Paulo Cortez No Employee 100000 E100000-2
2 Joan San Yes Non-employee 100001 N100001
3 Felipe Castro Yes Contractor 100002 C100002
4 Felipe Castro No Employee 100002 E100002-2
5 Felipe Castro No Contractor 100002 C100002-3