识别事件对,然后计算事件之间经过的时间



我有一个包含发送和接收消息的数据帧。我想计算一下某人回复消息所花的时间。

import numpy as np
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'sent':[78,18,94,55,68,57,78,8],
'received':[18,78,35,14,57,68,57,17],
'time':['2017-01-01T12','2017-01-01T13',
'2017-01-02T12','2017-02-01T13',
'2017-01-01T14','2017-01-01T15',
'2017-01-01T16','2017-01-01T17']})
df['time'] = pd.to_datetime(pd.Series(df['time']))

我想使用的方法是识别对,所以如果send=A和receive=B,那么应该有另一个send=B和received=A的条目。

df["pairs"] = df.apply(lambda x: not df[(df["sent"] == x["received"]) & (df["received"] == x["sent"]) & (df.index != x.name)].empty, axis=1)

然后,一旦我确定了配对,我就可以计算出响应所需的时间

sent_time = datetime.strptime('2017-01-01 12:00:00', fmt)
recieved_time = datetime.strptime('2017-01-01 13:00:00', fmt)
if sent_time > recieved_time:
td = sent_time - recieved_time
else:
td = recieved_time - sent_time
time = int(round(td.total_seconds() / 60))

我觉得我可以单独做这些,但我似乎无法把它们放在一起。


编辑

至于输出,我想我需要一个单独的数据帧,列出发件人和某人回复电子邮件所需的时间。

以为例

消息是由78发送的,花了60分钟才做出响应。然后68发送了一条消息,花了60分钟才回复

发件人time_to_responsed
7860
6860
#Sort row values to create unique group
df[['s','t']] = np.sort(df[['sent','received']], axis=1)
#Subset duplicated groups
s = df[df.duplicated(subset=['s','t'], keep=False)]

#Compute time difference between duplicated groups, drop duplicated rows and unwanted columns
s=s.assign(time_to_respond=s.groupby(['s','t'])['time'].transform(lambda x:x.diff().bfill().dt.total_seconds()/60)).drop_duplicates(subset=['s','t'])[['sent','time_to_respond']]

sent  time_to_respond
0    78             60.0
4    68             60.0

具有pandas.merge的命题:

df =  (
df.merge(df, left_on='sent',right_on='received',how='left')
.assign(time_to_respond= lambda x: (x['time_y'] - x['time_x']).dt.total_seconds()/60)
)

out = (
df.loc[(df['time_to_respond'].gt(0)), ['sent_x', 'time_to_respond']]
.rename(columns={'sent_x': 'sender'})
.reset_index(drop=True)
)

#输出:

print(out)
sender  time_to_respond
0      78             60.0
1      68             60.0
2      57             60.0

最新更新