我有一个数据帧,看起来像这样:
df = pd.DataFrame([1,'A','X','1/3/22 12:00:00AM','1/1/22 12:00:00 AM','1/2/22 12:00:00 AM'],
[1,'A','X','1/4/22 12:00:00AM','1/3/22 12:00:00 AM','1/3/22 12:00:00 AM'],
[1,'A','Y','1/3/22 12:00:00AM','1/2/22 12:00:00 AM','1/1/22 12:00:00 AM'],
[1,'B','X','1/3/22 12:00:00AM','1/2/22 12:00:00 AM','1/3/22 12:00:00 AM'],
[2,'A','X','1/5/22 12:00:00AM','1/3/22 12:00:00 AM','1/4/22 12:00:00 AM'],
[2,'A','X','1/6/22 12:00:00AM','1/4/22 12:00:00 AM','1/5/22 12:00:00 AM']],
columns = ['ID','Category','Site','Task Completed','Access Completed', 'Upload Completed'])
ID | 类别 | 站点任务完成访问完成 | 上传完成|||
---|---|---|---|---|---|
1 | A | X | 1/3/22 12:00:00 AM1/1/22 12:0:0 AM|||
1 | A | X | 1/4/22 12:00:00 AM | 1/3/22 12:0:0 AM | 2/3/22 12:00 AM |
1 | A | Y1/3/22 12:00:00 AM | |||
1 | B | X | 1/3/22 12:00:00 AM | 1/2/22 12:0:00 AM1/3/22 12:00 AM||
2 | A | X | 1/5/22 12:00:00 AM | 1/3/22 12:0:0 AM1/4/22 12:零时AM | |
2 | A | X | 1/6/22 12:00:00 AM | >1/4/22 12:0:0 AM1/5/22 12:00:00 AM
尝试:
df["Task Completed"] = pd.to_datetime(
df["Task Completed"], format="%m/%d/%y %H:%M:%S%p"
)
df["Access Completed"] = pd.to_datetime(
df["Access Completed"], format="%m/%d/%y %H:%M:%S %p"
)
df["Upload Completed"] = pd.to_datetime(
df["Upload Completed"], format="%m/%d/%y %H:%M:%S %p"
)
out = df.groupby(["ID", "Category", "Site"], as_index=False).agg(
{
"Task Completed": "first",
"Access Completed": "max",
"Upload Completed": "min",
}
)
out["Time Difference"] = np.where(
(out["Access Completed"] - out["Upload Completed"]) > pd.Timedelta(0),
(out["Access Completed"] - out["Task Completed"]).abs().dt.total_seconds()
/ 3600,
(out["Upload Completed"] - out["Task Completed"]).abs().dt.total_seconds()
/ 3600,
)
print(out)
打印:
时差 | |||
---|---|---|---|
0 | 1 | A | >td style="ext-align:left;0 |
1 | 24 | ||
2 | 0 | ||
3 | 24 |
@Andrej Kesely的回答已经涵盖了大部分内容。然而,如果你想在没有numpy
的情况下这样做,你可以将numpy
操作部分修改为
max_time = out[["Access Completed", "Upload Completed"]].max(axis=1)
out["Time Difference"] = (out["Task Completed"] - max_time).dt.total_seconds() / 3600