我有以下数据
data = {'timestamp': ['Friday, October 15, 2021 3:40 PM', 'Oct 15, 2021 03:06:29 PM', 'Friday, October 15, 2021 2:28 PM', 'Oct 15, 2021 06:23:51 AM', 'Oct 15, 2021 04:19:07 AM', 'Oct 15, 2021 08:19:07 AM'],
'emailuser': ['michael@google.com', 'caron@yt.com', 'luke@yt.com', 'sav@google.com','sav@google.com', 'paul@yt.com']
}
data = pd.DataFrame(data)
print(data)
我想计算谷歌员工的平均响应时间。在本例中,我想要得到
和 之间的时间差- michael@google.com - luke@yt.com(可以跳过caron@yt.com的时间戳,因为caron和luke在同一家公司)
- sav@google.com - paul@yt.com要忽略,因为它会导致负时差
看起来不太漂亮,但这应该是符合规格的。如果您有任何问题,或者如果您看到与规格有任何偏差,请评论。
import datetime
import pandas as pd
data = {
"timestamp": [
"Friday, October 15, 2021 3:40 PM",
"Oct 15, 2021 03:06:29 PM",
"Friday, October 15, 2021 2:28 PM",
"Oct 15, 2021 06:23:51 AM",
"Oct 15, 2021 04:19:07 AM",
"Oct 15, 2021 08:19:07 AM",
],
"emailuser": [
"michael@google.com",
"caron@yt.com",
"luke@yt.com",
"sav@google.com",
"sav@google.com",
"paul@yt.com",
],
}
def extract_datetime(timestamp: str) -> datetime.datetime:
for format in ["%A, %B %d, %Y %I:%M %p", "%b %d, %Y %I:%M:%S %p"]:
try:
return datetime.datetime.strptime(timestamp, format)
except:
pass
raise ValueError(f"Timestamp {timestamp} is invalid")
data = pd.DataFrame(data)
data["timestamp"] = data["timestamp"].apply(extract_datetime)
data["delta"] = datetime.timedelta(seconds=0)
gem = "@google.com"
for i in data.index:
if data["emailuser"].iat[i][-len(gem) :] == gem:
for j in range(i+1, data.index[-1]):
if data["emailuser"].iat[j][-len(gem) :] != gem:
delta = data["timestamp"].iat[i] - data["timestamp"].iat[j]
if delta > datetime.timedelta(0):
data["delta"].iat[i] = delta
else:
break
print(data)
timestamp emailuser delta
0 2021-10-15 15:40:00 michael@google.com 0 days 01:12:00
1 2021-10-15 15:06:29 caron@yt.com 0 days 00:00:00
2 2021-10-15 14:28:00 luke@yt.com 0 days 00:00:00
3 2021-10-15 06:23:51 sav@google.com 0 days 00:00:00
4 2021-10-15 04:19:07 sav@google.com 0 days 00:00:00
5 2021-10-15 08:19:07 paul@yt.com 0 days 00:00:00