Pandas,datetime和逻辑:汇总一列中特定行的总和



我得到了一个DataFrame:

date        start       end         inter
0   01-09-2020  10:01:24    10:02:13    0 days 00:00:49
1   01-09-2020  10:04:21    10:22:01    0 days 00:17:40
2   01-09-2020  10:16:14    10:25:06    0 days 00:08:52
3   01-09-2020  10:28:38    10:28:40    0 days 00:00:02
4   01-09-2020  10:37:38    10:37:41    0 days 00:00:03
... ... ... ... ...
995 17-09-2020  12:19:03    12:21:06    0 days 00:02:03
996 17-09-2020  12:22:53    12:22:58    0 days 00:00:05
997 17-09-2020  12:25:11    12:25:14    0 days 00:00:03
998 17-09-2020  12:27:07    12:27:08    0 days 00:00:01
999 17-09-2020  12:29:03    12:29:05    0 days 00:00:02
1000 rows × 4 columns

我想创建一个新的df,但在特定的日期时间范围内使用"inter"的总和。例如:

new_date    start_range end_range   inter_sum
0   01-09-2020  10:00:00    10:59:59    0 days 01:15:36 
1   01-09-2020  11:00:00    11:59:59    0 days 00:58:30
...
997 17-09-2020  10:00:00    10:59:59    0 days 03:00:15
998 17-09-2020  11:00:00    11:59:59    0 days 00:47:20

其中"inter_sum"是基于上一个df的"start"one_answers"end"的"start_range"one_answers"end_range(之间的"inter"值的总和。

试用resample:

#convert date column to datetime if needed
df["date"] = pd.to_datetime(df["date"])
#convert time column to datetime for resample
df["start"] = pd.to_datetime(df["date"].astype(str)+"T"+df["start"].astype(str))
#resample on start datetime every hour and sum
output = df.resample("H", on="start")["inter"].sum()
#formatting to match expected output
output["date"] = output["start"].dt.date
output["end"] = (output["start"] + pd.Timedelta(minutes=59, seconds=59)).dt.time
output["start"] = output["start"].dt.time
output = output[["date", "start", "end", "inter_sum"]]
>>> output
date     start       end       inter_sum
0     2020-01-09  10:00:00  10:59:59 0 days 00:27:26
1     2020-01-09  11:00:00  11:59:59 0 days 00:00:00
2     2020-01-09  12:00:00  12:59:59 0 days 00:00:00
3     2020-01-09  13:00:00  13:59:59 0 days 00:00:00
4     2020-01-09  14:00:00  14:59:59 0 days 00:00:00
...       ...       ...             ...
6046  2020-09-17  08:00:00  08:59:59 0 days 00:00:00
6047  2020-09-17  09:00:00  09:59:59 0 days 00:00:00
6048  2020-09-17  10:00:00  10:59:59 0 days 00:00:00
6049  2020-09-17  11:00:00  11:59:59 0 days 00:00:00
6050  2020-09-17  12:00:00  12:59:59 0 days 00:02:14

最新更新