在Dataframe中查找与Python中的Start和Enddate的交集



我有一个包含开始和结束日期时间的元素的数据帧。找到日期交叉点的最佳选择是什么?我现在天真的方法是由两个嵌套的循环交叉比较元素组成,这显然非常慢。实现这一目标的更好方法是什么?

dict = {}
start = "start_time"
end = "end_time"
for index1, rowLoop1 in df[{start, end}].head(500).iterrows():
matches = []
dict[(index1, rowLoop1[start])] = 0
for index2, rowLoop2 in df[{start,end}].head(500).iterrows():
if index1 != index2:
if date_intersection(rowLoop1[start], rowLoop1[end], rowLoop2[start], rowLoop2[end]):
dict[(index1, rowLoop1[start])] += 1

date_intersection的代码:

def date_intersection(t1start, t1end, t2start, t2end):  
if (t1start <= t2start <= t2end <= t1end): return True
elif (t1start <= t2start <= t1end):return True
elif (t1start <= t2end <= t1end):return True
elif (t2start <= t1start <= t1end <= t2end):return True
else: return False

样本数据:

id,start_date,end_date
41234132,2021-01-10 10:00:05,2021-01-10 10:30:27
64564512,2021-01-10 10:10:00,2021-01-11 10:28:00
21135765,2021-01-12 12:30:00,2021-01-12 12:38:00
87643252,2021-01-12 12:17:00,2021-01-12 12:42:00
87641234,2021-01-12 12:58:00,2021-01-12 13:17:00

您可以执行一些操作,比如将数据帧与自身合并以获得笛卡尔乘积并比较列。

df = df.merge(df, how='cross', suffixes=('','_2'))
df['date_intersection'] =  (((df['start_date'].le(df['start_date_2']) & df['start_date_2'].le(df['end_date'])) |    # start 2 within start/end
(df['start_date'].le(df['end_date_2']) & df['end_date_2'].le(df['end_date'])) |        # end 2 within start/end
(df['start_date_2'].le(df['start_date']) & df['start_date'].le(df['end_date_2'])) |    # start within start 2/end 2                      
(df['start_date_2'].le(df['end_date']) & df['end_date'].le(df['end_date_2']))) &       # end within start 2/end 2  
df['id'].ne(df['id_2'])) # id not compared to itself

然后返回id,如果它们有日期交集。。。

df.groupby('id')['date_intersection'].any()
id
21135765     True
41234132     True
64564512     True
87641234    False
87643252     True

或者如果您需要与相交的ID

df.loc[df['date_intersection'], :].groupby(['id'])['id_2'].agg(list).to_frame('intersected_ids')
intersected_ids
id                      
21135765      [87643252]
41234132      [64564512]
64564512      [41234132]
87643252      [21135765]

相关内容

最新更新