并发冲突

  • 本文关键字:冲突 并发 pandas
  • 更新时间 :
  • 英文 :


我正试图在Python中找到连续的日期时间。我已经到了能够通过循环发现每一行是否冲突的地步,但却被困在如何发现事件是否并发上。关于我如何做到这一点,有什么建议吗?也可以采用类似的方法!

并发是指名称和事件日期相同并且连续冲突的计数>=3.

样本数据1:

日期事件开始8/4/208/4/20 8:50 AM8/20 9:20 AM
事件ID 名称事件结束
123 Hoper,Charles
456 Hoper,Charles
789 Hoper,Charles
1011 Perez,Daniel
1213 Shah,Kim
1415 Shah,Kim

好的,最后的方法!如果我理解,那么你就不在乎";连续的";列中,您只想了解一行中的3个重叠窗口。以下是一种尝试立即回答这个问题的方法。它通过了两个测试数据集(感谢编辑!(

import pandas as pd
import numpy as np
import itertools
import io
#Creating the test df you provided
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
123;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10:30 AM
456;Hoper, Charles;8/4/20;8/4/20 8:50 AM;8/4/20 9:20 AM
789;Hoper, Charles;8/4/20;8/4/20 8:30 AM;8/4/20 10 AM
1011;Perez, Daniel;8/10/20;8/10/20 9 AM;8/10/20 11 AM
1213;Shah, Kim;8/5/20;8/5/20 12 PM;8/5/20 1 PM
1415;Shah, Kim;8/5/20;8/5/20 12:30 PM;8/5/20 1 PM
"""),sep=';')
#Overridding with The second test df
df = pd.read_csv(io.StringIO("""
Event ID;Name;Date;Event Start;Event End
88;Cooper, Herbert;10/20/20;10/20/20 8:10 AM;10/20/20 9:48 AM
99;Cooper, Herbert;10/20/20;10/20/20 9:19 AM;10/20/20 11:30 AM
10;Cooper, Herbert;10/20/20;10/20/20 11:52 AM;10/20/20 1:26 PM
11;Cooper, Herbert;10/20/20;10/20/20 1:22 AM;10/20/20 2:15 PM
12;Cooper, Herbert;10/20/20;10/20/20 3:23 PM;10/20/20 4:10 PM
"""),sep=';')
df['Event Start'] = pd.to_datetime(df['Event Start'])
df['Event End'] = pd.to_datetime(df['Event End'])
df['overlap'] = False
#Iterating line by line keeping track of whether 3 conflicts are found
last_name = None
last_date = None
last_end = pd.Timestamp.max
num_consecutive = 0
for i,r in df.iterrows():

streak_continues = all([
last_name == r['Name'],
last_date == r['Date'],
r['Event Start'] < last_end,
])

if not streak_continues:
if num_consecutive >= 3:
df.loc[
df['Name'].eq(last_name) & df['Date'].eq(last_date),
'overlap'
] = True

num_consecutive = 0

last_name = r['Name']
last_date = r['Date']
last_end = r['Event End']
num_consecutive += 1

df

最新更新