如何计算一列的持续时间,按另一列(Python 或 R)分组



我有一个数据帧,df,包含以下数据:

ID            DateTime        
A             12/13/2019 6:35:48PM
A             12/13/2019 6:35:49PM
A             12/13/2019 6:35:50PM
B             12/13/2019 7:00:00PM
B             12/13/2019 7:00:05PM
C             12/13/2019 8:00:05PM

期望的结果:

ID              Duration
A                  3 sec
B                  5 sec
C                  1 sec

我正在使用Python执行的代码:

df.sum(group_by['ID'])

如何计算按另一列分组的列的持续时间?

任何建议都会有所帮助。

您可以在 R 中使用dplyrmagrittr包执行此操作

library(dplyr)
library(magrittr)
x <- data.frame(ID = c("A","A","A","B","B","C"),
DateTime =  c("12/13/2019 6:35:48PM", "12/13/2019 6:35:49PM",
"12/13/2019 6:35:50PM","12/13/2019 7:00:00PM", 
"12/13/2019 7:00:05PM","12/13/2019 8:00:05PM"))
x$DateTime <- as.POSIXct(x$DateTime, format = c("%m/%d/%Y %H:%M:%S"))
x %>% 
group_by(ID) %>%
mutate(dif = max(DateTime)- min(DateTime)) %>% 
select(ID, dif) %>% distinct()
# A tibble: 3 x 2
# Groups:   ID [3]
ID    dif   
<fct> <drtn>
1 A     2 secs
2 B     5 secs
3 C     0 secs

时间戳是否已排序?您可能希望查找最早和最新的时间戳,并为每个 ID 减去它们。

您可以创建自定义函数,并按'ID'分组的每个序列进行聚合

import pandas as pd 
from datetime import datetime
def duration(series):
return (max(series) - min(series)).total_seconds()
df.groupby['ID'].agg({'DateTime' : duration})

最新更新