对列值有条件的id最早和最晚的记录



我有一个json数据,必须转换成列,我能够成功地做到这一点,它导致我的东西如下:

时间戳2021-09-012021-09-022021-09-012021-09-022021-09-032021-09-012021-09-022021-09-032021-09-042021-09-05

您可以分组,筛选,然后取第一/最后,如下所示:

# copy, clean df
# df = pd.read_clipboard("ss+")
# df["status"] = df.status.str.lower()
# df["timestamp"] = pd.to_datetime(df.timestamp)
# df = df.sort_values("timestamp")
# this stores the original index as a column, creates a new
df = df.reset_index()
df_open = df[df.status == "open"].groupby(["id"], as_index=False).first()
df_closed = df[df.status == "closed"].groupby(["id"], as_index=False).last()
# discard the temp index, set back to original index    
sol = pd.concat([df_open, df_closed]).set_index("index", drop=True).sort_index()

输出:

id  phone country  country_code  status  timestamp
index
0      abc    123   India            91    open 2021-09-01
1      abc    123   India            91  closed 2021-09-02
2      xyz    456   India            91    open 2021-09-01
3      xyz    456   India            91  closed 2021-09-02
5      ijk    789   India            91    open 2021-09-01
8      ijk    789   India            91  closed 2021-09-04
9      suv      0      US             1    open 2021-09-05

最新更新