我有两个数据帧要讨论- 1包含公司中的officersID和官员姓名:
officer_df = pd.DataFrame({'officerID': ['01', '02', '03'], 'Name': ['Tom', 'Dick', 'Harry']})
,另一个包含officersID和休假日期,如果他们已经申请休假:
df_officer_leave = pd.DataFrame({'officerID': ['01', '01'], 'leave start date': ['2020-12-15', '2020-12-31'], 'leave end date': ['2020-12-16', '2021-01-02']})
现在我想使用一个函数leave_col_set来比较我的officer_df中的officerID,并与df_officer_leave进行比较,以返回[离开开始日期,离开结束日期]的列表,并将返回的列表作为基于officerID的officer_df的新列添加,但我一直有一个错误。
我不知所措,因此来堆栈溢出寻求指导。提前感谢你们善良的灵魂。
import pandas as pd
officer_df = pd.DataFrame({'officerID': ['01', '02', '03'], 'Name': ['Tom', 'Dick', 'Harry']})
df_officer_leave = pd.DataFrame({'officerID': ['01', '01'], 'leave start date': ['2020-12-15', '2020-12-31'], 'leave end date': ['2020-12-16', '2021-01-02']})
df_officer_leave['leave start date']= pd.to_datetime(df_officer_leave['leave start date'])
df_officer_leave['leave end date']= pd.to_datetime(df_officer_leave['leave end date'])
def leave_col_set(x, df_officer_leave):
return [*df_officer_leave[df_officer_leave['officerID']==x][['leave start date', 'leave end date']].values.tolist()]
#leave logic
officer_df["leaveDays"] = officer_df.officerID.apply(leave_col_set, args=(df_officer_leave))
最后一行的正确语法是:
officer_df["leaveDays"] = officer_df.officerID.apply(leave_col_set, args=(df_officer_leave, ))
有关更多信息,请参见:传递Dataframe以应用函数pandas作为参数
话虽如此,我强烈建议不要将整个数据帧作为参数传递,特别是当您只是提取信息时。
在您的情况下,以下操作就足够了,因为您可以从函数内部访问df_officer_leave
:
import pandas as pd
officer_df = pd.DataFrame({'officerID': ['01', '02', '03'],
'Name': ['Tom', 'Dick', 'Harry']})
df_officer_leave = pd.DataFrame({'officerID': ['01', '01'],
'leave start date': ['2020-12-15', '2020-12-31'],
'leave end date': ['2020-12-16', '2021-01-02']})
df_officer_leave['leave start date']= pd.to_datetime(df_officer_leave['leave start date'])
df_officer_leave['leave end date']= pd.to_datetime(df_officer_leave['leave end date'])
def leave_col_set(x):
return [*df_officer_leave[df_officer_leave['officerID']==x][['leave start date', 'leave end date']].values.tolist()]
officer_df['leaveDays'] = officer_df.officerID.apply(leave_col_set)