Python DataFrame:根据另一个df的条件更改df中一行的状态



我有两个df,一个带有students details记录,另一个具有students attendance记录。

详细信息_df

name          roll     start_day     last_day
0   anthony        9       2020-09-08    2020-09-28
1   paul           6       2020-09-01    2020-09-15
2   marcus        10       2020-08-08    2020-09-08

attendance_df

name         roll     status       day
0    anthony        9      absent       2020-07-25      
1    anthony        9      present      2020-09-15
2    anthony        9      absent       2020-09-25
3    paul           6      present      2020-09-02
4    marcus        10      present      2020-07-01     
5    marcus        10      present      2020-08-17

我试图为start_day and last_day之间的每个用户获取status=absent真/假。

例如:在3的总记录中,用户-anthonyattendance_df中的start_day和last_day之间有两条记录。根据这两条记录,如果status=absent,则将该用户标记为True

预期输出

name          roll      absent
0   anthony        9         True
1   paul           6         False
2   marcus        10         False

我曾尝试将details_df放入列表中,然后循环到attendance_df中。但是还有其他有效的方法吗?

您需要执行merge(即联接操作(并筛选列day位于start_daylast_day之间的天数。然后,按+分组应用(即分组聚合操作(:

merged_df = attendance_df.merge(details_df, on=['name', 'roll'])
df = (merged_df[merged_df.day.between(merged_df.start_day, merged_df.last_day)]
.groupby(['name', 'roll'])
.apply(lambda x: (x.status == 'absent').any())
.reset_index())
df.columns = ['name', 'roll', 'absent']

获取:

df
name  roll  absent
0  anthony     9    True
1   marcus    10   False
2     paul     6   False

Mergegroupby(),并使用lambda函数查找startlast之间的任何天数

df2=pd.merge(attendance_df,details_df, how='left', on=['name','roll'])
df2.groupby(['name','roll']).apply(lambda x: (x['day'].
between(x['start_day'],x['last_day'])).any(0)).to_frame('absent')

absent
name    roll        
anthony 9       True
marcus  10      True
paul    6       True

最新更新