使用两个日期中较晚的一个,然后减去groupby中的另一个日期



我有一个数据帧,看起来像这样:

df = pd.DataFrame([1,'A','X','1/3/22 12:00:00AM','1/1/22 12:00:00 AM','1/2/22 12:00:00 AM'],
[1,'A','X','1/4/22 12:00:00AM','1/3/22 12:00:00 AM','1/3/22 12:00:00 AM'],
[1,'A','Y','1/3/22 12:00:00AM','1/2/22 12:00:00 AM','1/1/22 12:00:00 AM'],
[1,'B','X','1/3/22 12:00:00AM','1/2/22 12:00:00 AM','1/3/22 12:00:00 AM'],
[2,'A','X','1/5/22 12:00:00AM','1/3/22 12:00:00 AM','1/4/22 12:00:00 AM'],
[2,'A','X','1/6/22 12:00:00AM','1/4/22 12:00:00 AM','1/5/22 12:00:00 AM']],
columns = ['ID','Category','Site','Task Completed','Access Completed', 'Upload Completed'])
站点任务完成上传完成1/3/22 12:00:00 AM1/1/22 12:0:0 AM1/2/22 12::00:0 AMY1/2/22 12:0:00 AM1/3/22 12:00 AM1/3/22 12:0:0 AM>1/4/22 12:0:0 AM1/5/22 12:00:00 AM
ID类别访问完成
1AX
1AX1/4/22 12:00:00 AM1/3/22 12:0:0 AM2/3/22 12:00 AM
1A1/3/22 12:00:00 AM
1BX1/3/22 12:00:00 AM
2AX1/5/22 12:00:00 AM1/4/22 12:零时AM
2AX1/6/22 12:00:00 AM

尝试:

df["Task Completed"] = pd.to_datetime(
df["Task Completed"], format="%m/%d/%y %H:%M:%S%p"
)
df["Access Completed"] = pd.to_datetime(
df["Access Completed"], format="%m/%d/%y %H:%M:%S %p"
)
df["Upload Completed"] = pd.to_datetime(
df["Upload Completed"], format="%m/%d/%y %H:%M:%S %p"
)
out = df.groupby(["ID", "Category", "Site"], as_index=False).agg(
{
"Task Completed": "first",
"Access Completed": "max",
"Upload Completed": "min",
}
)
out["Time Difference"] = np.where(
(out["Access Completed"] - out["Upload Completed"]) > pd.Timedelta(0),
(out["Access Completed"] - out["Task Completed"]).abs().dt.total_seconds()
/ 3600,
(out["Upload Completed"] - out["Task Completed"]).abs().dt.total_seconds()
/ 3600,
)
print(out)

打印:

>td style="ext-align:left;
时差
01A0
124
20
324

@Andrej Kesely的回答已经涵盖了大部分内容。然而,如果你想在没有numpy的情况下这样做,你可以将numpy操作部分修改为

max_time = out[["Access Completed", "Upload Completed"]].max(axis=1)
out["Time Difference"] = (out["Task Completed"] - max_time).dt.total_seconds() / 3600

相关内容

  • 没有找到相关文章

最新更新