将第一个"开始日期"的列设置为"是"



我有以下数据帧:

ID  Job     Start Date
1   Driver  1951-01-01
1   Cleaner 2000-05-01
1   Staff   1951-01-01
2   Staff   2000-05-02
2   Staff2  2000-05-01
2   Cleaner 2000-04-01
5   Driver  1951-01-01

我需要创建一个名为";初级"其具有值"0";是";或";否";取决于的开始日期

  • 对于每个ID,将列设置为";是";对于最早的";开始日期";如果有2行与相同的"ID"绑定;开始日期";选择一个设置为"0";是";(可能是第一个出现的(

  • 所有其他行设置为"0";否";对于ID

在这种情况下,每个ID都有一行,其中有一个";是";(仅1个单行(和所有其它行为"1";否";产生这个数据帧:

ID  Job     Start Date  Primary?
1   Driver  1951-01-01  Yes
1   Cleaner 2000-05-01  No
1   Staff   1951-01-01  No
2   Staff   2000-05-02  No
2   Staff2  2000-05-01  Yes
2   Cleaner 2000-04-01  Yes
5   Driver  1951-01-01  Yes

最好的方法是什么?

# set the primary start-date row as True/False, when its a min for the ID
df['Primary']= df['Start_Date'].eq(df.groupby(['ID'])['Start_Date'].transform(min))
# identify the duplicates start-dates
df.loc[df.duplicated(subset=['ID','Primary'], keep='first'),'Primary' ] = False
df
# Map True/False to Yes/No
df['Primary']=df['Primary'].map({True: 'Yes', False: 'No'})
df
+----+-----+----------+-------------+---------+
|    | ID  |   Job    | Start_Date  | Primary |
+----+-----+----------+-------------+---------+
| 0  |  1  | Driver   | 1951-01-01  | Yes     |
| 1  |  1  | Cleaner  | 2000-05-01  | No      |
| 2  |  1  | Staff    | 1951-01-01  | No      |
| 3  |  2  | Staff    | 2000-05-02  | No      |
| 4  |  2  | Staff2   | 2000-05-01  | No      |
| 5  |  2  | Cleaner  | 2000-04-01  | Yes     |
| 6  |  5  | Driver   | 1951-01-01  | Yes     |
+----+-----+----------+-------------+---------+

您可以使用transform获取每个id出现的第一个日期,然后使用np.where:

df['Primary'] =  np.where(df['Start Date'] == df.groupby('ID')['Start Date'].
transform('first'), 'Yes', 'No')

或者,如果要将最短日期转换为Yes,则可以设置transform('min')

此解决方案可处理以下工作:

(df
.assign(primary=lambda x: x.groupby("ID")["Start"].transform("min"))
.assign(keep=lambda x: x.groupby("ID")["Start"].transform(lambda x: x.duplicated("first")))
.assign(primary=lambda x: np.select([x.Start == x.primary],
["Yes"],
default="No")
)
.assign(primary=lambda x: np.select([(x.primary == "Yes") & ~(x.keep)],
["Yes"],
default="No")
)
)

最新更新