我的问题如下:我有一个车祸(id_accident(和乘客受害者(id_victim。
id_accident | id_victim | date_accident | ROL
---|---|---|
123 | 2021/20/01 | 乘客|
456 | 12B | 2020/19/08 | 乘客
111 | 41A | 2021/20/01乘客 | [/tr>
222 | 54B | 2020/19/08乘客 | [/tr>
这看起来像经典的SQL问题。您需要什么样的输出格式?
我不得不把DF2的第一个日期改成2020/20/01
,让熊猫认出它。以下是使用pd.merge
的完整示例
import pandas as pd
import numpy as np
from io import StringIO
df1 = pd.read_csv(StringIO("""id_accident id_victim date_accident ROL
123 23A 2021/20/01 PASSENGER
456 12B 2020/19/08 PASSENGER
111 41A 2021/20/01 PASSENGER
222 54B 2020/19/08 PASSENGER"""), sep="s+", parse_dates=["date_accident"], dayfirst=True)
df2 = pd.read_csv(StringIO("""id_accident id_victim date_accident ROL
001 23A 2020/20/09 PASSENGER
002 12B 2019/31/12 DRIVER
003 41A 2020/20/12 PASSENGER
004 54B 2020/20/07 DRIVER"""), sep="s+", parse_dates=["date_accident"], dayfirst=True)
df3 = df1.merge(df2, on="id_victim")
现在df3等于
id_accident_x id_victim date_accident_x ROL_x id_accident_y date_accident_y ROL_y
0 123 23A 2021/20/01 PASSENGER 1 2020/20/19 PASSENGER
1 456 12B 2020/19/08 PASSENGER 2 2019/31/12 DRIVER
2 111 41A 2021/20/01 PASSENGER 3 2020/20/12 PASSENGER
3 222 54B 2020/19/08 PASSENGER 4 2020/20/07 DRIVER
并根据您的条件过滤
>>> df3[(df3.ROL_x == "PASSENGER") & (df3.ROL_y == "DRIVER") & ((df3.date_accident_y - df3.date_accident_y).dt.days < 90)]
id_accident_x id_victim date_accident_x ROL_x id_accident_y date_accident_y ROL_y
1 456 12B 2020-08-19 PASSENGER 2 2019-12-31 DRIVER
3 222 54B 2020-08-19 PASSENGER 4 2020-07-20 DRIVER