我有以下pd.DataFrame
df = pd.DataFrame({'admission_timestamp': ['2021-01-17 17:45:00', '2020-03-31 23:32:00', '2020-03-27 18:20:00', '2020-04-17 18:12:00', '2020-03-19 19:12:00'], 'end_period': ['2021-01-18 17:45:00', '2020-04-01 23:32:00', '2020-03-28 18:20:00', '2020-04-18 18:12:00', '2020-03-20 19:12:00'], 'start_med': ['NaT', '2020-04-01 00:00:00', '2020-03-27 19:00:00', '2020-04-17 18:39:24', 'NaT'], 'end_med': ['NaT', '2020-04-14 21:00:00', '2020-04-05 00:00:00', '2020-05-06 22:07:29', 'NaT']})
看起来像:
admission_timestamp end_period start_med end_med
1 2021-01-17 17:45:00 2021-01-18 17:45:00 NaT NaT
2 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00 2020-04-14 21:00:00
3 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00 2020-04-05 00:00:00
4 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24 2020-05-06 22:07:29
5 2020-03-19 19:12:00 2020-03-20 19:12:00 NaT NaT
我想创建一个新的列received_medidation
,说明患者是否在admission_timestamp
和end_period
之间接受了药物治疗(布尔值((即使只有一秒钟(。因此,布尔值应该说明admission_timestamp
和end_period
之间是否有任何时间与start_med
和end_med
之间的时间重叠。数据类型都是日期时间64[ns]。
我知道我们可以创建诸如之类的布尔掩码
condition = (df['date'] > start_date) & (df['date'] <= end_date)
然而,我不明白这怎么可能解决上述任务。感谢您的帮助。
如果保证start_med
晚于admission_timestamp
,则start_med
日期在admission_timestamp
和end_period
之间就足够了
for col in df.columns:
df[col] = pd.to_datetime(df[col])
df['received_medidation'] = (df['admission_timestamp'] < df['start_med']) & (df['start_med'] < df['end_period'])
然而,如果start_med
可以在admission_timestamp
之前,那么这意味着'start_med' < 'admission_timestamp' < 'end_med'
也会创建日期的交集。然后,我们将此情况与使用OR运算符的上一个情况包括在内:
df['received_medidation'] = (df['start_med'].between(df['admission_timestamp'], df['end_period']) |
df['admission_timestamp'].between(df['start_med'], df['end_med']))
注意:这里的总体假设是admission_timestamp < end_period
和start_med < end_med
总是正确的,在这种情况下,上面的逻辑表达式捕获所有相交的日期。
输出:
admission_timestamp end_period start_med
0 2021-01-17 17:45:00 2021-01-18 17:45:00 NaT
1 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00
2 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00
3 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24
4 2020-03-19 19:12:00 2020-03-20 19:12:00 NaT
end_med received_medidation
0 NaT False
1 2020-04-14 21:00:00 True
2 2020-04-05 00:00:00 True
3 2020-05-06 22:07:29 True
4 NaT False
使用between
df['overlaps'] = df['start_med'].between(df['admission_timestamp'], df['end_period'])
| df['end_med'].between(df['admission_timestamp'], df['end_period'])
print(df)
# Output
admission_timestamp end_period start_med end_med overlaps
1 2021-01-17 17:45:00 2021-01-18 17:45:00 NaT NaT False
2 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00 2020-04-14 21:00:00 True
3 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00 2020-04-05 00:00:00 True
4 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24 2020-05-06 22:07:29 True
5 2020-03-19 19:12:00 2020-03-20 19:12:00 NaT NaT False