获取Dataframe中的重复行并覆盖它们



我有以下Dataframe:

<表类> 指数 errorId 开始结束时间戳uniqueIdtbody><<tr>014042022-04-25 02:10:412022-04-25 02:10:462022-04-251404 _2022-04-25113022022-04-25 02:10:412022-04-25 02:10:462022-04-251302 _2022-04-25214042022-04-27 12:54:462022-04-27 12:54:512022-04-251404 _2022-04-25313022022-04-27 13:34:432022-04-27 13:34:502022-04-251302 _2022-04-25414042022-04-29 04:30:222022-04-29 04:30:292022-04-251404 _2022-04-25513022022-04-29 08:26:252022-04-29 08:26:322022-04-251302 _2022-04-25

我认为你需要聚合minmax每3列命名聚合,最后为相同的列顺序,如原始添加DataFrame.reindex:

df1 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
.agg(start=('start','min'), end=('end','max'))
.reindex(df.columns, axis=1))

或按firstlast聚合,如果日期时间按组排序,则得到相同的输出:

df2 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
.agg(start=('start','first'), end=('end','last'))
.reindex(df.columns, axis=1))

最新更新