两个时间戳之间的持续时间



对于每个用户,我有一个具有不同时间戳的数据帧,我想计算持续时间。我使用以下代码导入CSV文件:

import pandas as pd
import glob
path = r'C:Users...Desktop' 
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0,encoding='ISO-8859-1')
li.append(df)
df = pd.concat(li, axis=0, ignore_index=True)

df.head ()

ID     timestamp
1828765  31-05-2021 22:27:03    
1828765  31-05-2021 22:27:12    
1828765  31-05-2021 22:27:13    
1828765  31-05-2021 22:27:34
2056557  21-07-2021 10:27:12
2056557  21-07-2021 10:27:20
2056557  21-07-2021 10:27:22

我想要得到这样的东西

ID    timestamp             duration(s)
1828765  31-05-2021 22:27:03    NAN
1828765  31-05-2021 22:27:12    9
1828765  31-05-2021 22:27:13    1
1828765  31-05-2021 22:27:34    21
2056557  21-07-2021 10:27:12    NAN
2056557  21-07-2021 10:27:20    8
2056557  21-07-2021 10:27:22    2

我用过这段代码,但是不适合我

import datetime
df['timestamp'] =  pd.to_datetime(df['timestamp'], format = "%d-%m-%Y %H:%M:%S") 
df['time_diff'] = 0
for i in range(df.shape[0] - 1):
df['time_diff'][i+1] = (datetime.datetime.min +  (df['timestamp'][i+1] - df['timestamp'][i])).time()

发生在一组值上的操作是pandas中的GroupBy操作。

pandas原生支持时间戳上的数学运算。因此,减法将给出任意两个时间戳之间的正确持续时间。

我们已经成功地将timestamp列转换为datetime64[ns]

df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")

现在我们可以用Groupby.diff

计算组内行之间的差值
df['duration'] = df.groupby('ID')['timestamp'].diff()

df

ID           timestamp        duration
0  1828765 2021-05-31 22:27:03             NaT
1  1828765 2021-05-31 22:27:12 0 days 00:00:09
2  1828765 2021-05-31 22:27:13 0 days 00:00:01
3  1828765 2021-05-31 22:27:34 0 days 00:00:21
4  2056557 2021-07-21 10:27:12             NaT
5  2056557 2021-07-21 10:27:20 0 days 00:00:08
6  2056557 2021-07-21 10:27:22 0 days 00:00:02

如果我们想以秒为单位获得持续时间,我们可以使用Series.dt.total_seconds:

提取总秒数。
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()

df:

ID           timestamp  duration (s)
0  1828765 2021-05-31 22:27:03           NaN
1  1828765 2021-05-31 22:27:12           9.0
2  1828765 2021-05-31 22:27:13           1.0
3  1828765 2021-05-31 22:27:34          21.0
4  2056557 2021-07-21 10:27:12           NaN
5  2056557 2021-07-21 10:27:20           8.0
6  2056557 2021-07-21 10:27:22           2.0

完整工作示例:

import pandas as pd
df = pd.DataFrame({
'ID': [1828765, 1828765, 1828765, 1828765, 2056557, 2056557, 2056557],
'timestamp': ['31-05-2021 22:27:03', '31-05-2021 22:27:12',
'31-05-2021 22:27:13', '31-05-2021 22:27:34',
'21-07-2021 10:27:12', '21-07-2021 10:27:20',
'21-07-2021 10:27:22']
})
df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()
print(df)

相关内容

  • 没有找到相关文章

最新更新