日期时间我是熊猫的新手,需要一些帮助。我有一个csv文件,其中包含许多设备注册的namy日期时间,这些时间按ID排序。它们是特定设备在该区域注册的时间和日期。我需要根据最短和最长的日期时间来计算每个设备在那里的时间(比如分钟(,或者在只有一个寄存器的情况下告诉设备在那里,我一开始就被卡住了。我想我需要做一些分组赛,但我迷路了。
我所能做的就是:
visitas = pd.read_csv("visitas.csv")
visitas['datetime_local'] = pd.to_datetime(visitas.datetime_local)
使用.groupby:
import numpy as np
import pandas as pd
data = {
"device_id": [1, 2, 3, 4, 3, 4, 1, 2],
"datetime_local": ["2022-09-22 06:00:00", "2022-09-22 08:58:10", "2022-09-22 22:23:02", "2022-09-22 09:12:54",
"2022-09-23 01:16:17", "2022-09-22 11:18:05", "2022-09-22 12:01:23", "2022-09-22 09:15:02"]
}
visitas = pd.DataFrame(data)
visitas["datetime_local"] = pd.to_datetime(visitas["datetime_local"])
visitas["device_present_minutes"] = (
visitas
.sort_values(["device_id", "datetime_local"], ascending=True)
.groupby("device_id")["datetime_local"]
.diff() / pd.Timedelta(minutes=1)
)
visitas = (
visitas
.replace({np.NaN: "-"})
.sort_values("device_id", ascending=True)
.reset_index(drop=True)
)
print(visitas)
device_id datetime_local device_present_minutes
0 1 2022-09-22 06:00:00 -
1 1 2022-09-22 12:01:23 361.383333
2 2 2022-09-22 08:58:10 -
3 2 2022-09-22 09:15:02 16.866667
4 3 2022-09-22 22:23:02 -
5 3 2022-09-23 01:16:17 173.25
6 4 2022-09-22 09:12:54 -
7 4 2022-09-22 11:18:05 125.183333