我有一个数据帧,df
,包含以下数据:
ID DateTime
A 12/13/2019 6:35:48PM
A 12/13/2019 6:35:49PM
A 12/13/2019 6:35:50PM
B 12/13/2019 7:00:00PM
B 12/13/2019 7:00:05PM
C 12/13/2019 8:00:05PM
期望的结果:
ID Duration
A 3 sec
B 5 sec
C 1 sec
我正在使用Python执行的代码:
df.sum(group_by['ID'])
如何计算按另一列分组的列的持续时间?
任何建议都会有所帮助。
您可以在 R 中使用dplyr
和magrittr
包执行此操作
library(dplyr)
library(magrittr)
x <- data.frame(ID = c("A","A","A","B","B","C"),
DateTime = c("12/13/2019 6:35:48PM", "12/13/2019 6:35:49PM",
"12/13/2019 6:35:50PM","12/13/2019 7:00:00PM",
"12/13/2019 7:00:05PM","12/13/2019 8:00:05PM"))
x$DateTime <- as.POSIXct(x$DateTime, format = c("%m/%d/%Y %H:%M:%S"))
x %>%
group_by(ID) %>%
mutate(dif = max(DateTime)- min(DateTime)) %>%
select(ID, dif) %>% distinct()
# A tibble: 3 x 2
# Groups: ID [3]
ID dif
<fct> <drtn>
1 A 2 secs
2 B 5 secs
3 C 0 secs
时间戳是否已排序?您可能希望查找最早和最新的时间戳,并为每个 ID 减去它们。
您可以创建自定义函数,并按'ID'
分组的每个序列进行聚合
import pandas as pd
from datetime import datetime
def duration(series):
return (max(series) - min(series)).total_seconds()
df.groupby['ID'].agg({'DateTime' : duration})