我想使用 while 循环获取 pandas 数据框中当前行和上一行之间的时间差。为了提供一些上下文,下面是我的示例代码:
counter = len(data)-1
last = pd.to_datetime(data['time'], infer_datetime_format=True)
current = last
while((last-current).seconds() <= 60 and counter>-1):
# Do something
data[counter]
但是,我收到此错误消息:
AttributeError: 'Series' object has no attribute 'seconds'
据我了解,datetime
函数不适用于熊猫系列,因此至少有两种方法可以解决此问题:
1) 将last
存储为日期时间对象以利用.seconds()
datetime
函数
2)使用熊猫版的(last-current).seconds()
datetime
功能
任何帮助将不胜感激!
附加的数据结构示例
{'time': {0: Timestamp('2016-03-28 23:23:12'), 1: Timestamp('2016-03-28 23:23:32'), 2: Timestamp('2016-03-28 23:23:52'), 3: Timestamp('2016-03-28 23:24:12'), 4: Timestamp('2016-03-28 23:22:12'), 5: Timestamp('2016-03-28 23:24:32'), 6: Timestamp('2016-03-28 23:24:52'), 7: Timestamp('2016-03-28 23:25:32'), 8: Timestamp('2016-03-28 23:30:12'), 9: Timestamp('2016-03-28 23:29:12')}, 'Origin': {0: 'Boston', 1: 'New York', 2: 'Boston', 3: 'New York', 4: 'Hawaii', 5: 'Hawaii', 6: 'Miami', 7: 'Las Vegas', 8: 'Hawaii', 9: 'New York'}, 'Destination': {0: 'Miami', 1: 'Miami', 2: 'Miami', 3: 'Boston', 4: 'Boston', 5: 'New York', 6: 'Las Vegas', 7: 'Las Vegas', 8: 'Las Vegas', 9: 'Los Angeles'}}
我相信
这就是你需要的:
data['time'].diff()
下面是输出:
0 NaT
1 00:00:20
2 00:00:20
3 00:00:20
4 -1 days +23:58:00
5 00:02:20
6 00:00:20
7 00:00:40
8 00:04:40
9 -1 days +23:59:00
Name: time, dtype: timedelta64[ns]
编辑以回复评论
有几种方法可以获取总秒数。
In [12]: data['time'].diff() / np.timedelta64(1, 's')
Out[12]:
0 NaN
1 20
2 20
3 20
4 -120
5 140
6 20
7 40
8 280
9 -60
Name: time, dtype: float64
In [13]: timeit data['time'].diff() / np.timedelta64(1, 's')
1000 loops, best of 3: 738 µs per loop
In [14]: data['time'].diff().map(lambda td: td.item(), na_action='ignore')*1e-9
Out[14]:
0 NaN
1 20
2 20
3 20
4 -120
5 140
6 20
7 40
8 280
9 -60
Name: time, dtype: object
In [15]: timeit data['time'].diff().map(lambda td: td.item(), na_action='ignore')*1e-9
1000 loops, best of 3: 381 µs per loop
或者更好的是,:
In [17]: np.divide(data['time'].diff() , np.timedelta64(1, 's'))
Out[17]:
0 NaN
1 20
2 20
3 20
4 -120
5 140
6 20
7 40
8 280
9 -60
Name: time, dtype: float64
timeit np.divide(data['time'].diff() , np.timedelta64(1, 's'))
The slowest run took 4.27 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 155 µs per loop