无法有条件地移动Pandas DataFrame索引



我有一个数据框

>>> dfn.head()
Out[8]: 
2012-02-27 00:00:00+00:00    3054679365.000
2012-02-27 01:00:00+00:00    1433475236.000
2012-02-27 02:00:00+00:00    1725293108.000
2012-02-27 03:00:00+00:00    1089842336.000
2012-02-27 04:00:00+00:00    1637301178.000
>>> dfn.tail()
2012-03-02 20:00:00+00:00    3696373423.000
2012-03-02 21:00:00+00:00    3423657296.000
2012-03-02 22:00:00+00:00    1887346076.000
2012-03-02 23:00:00+00:00     426382220.400
2012-03-03 00:00:00+00:00     759307738.400
dtype: float64

频率是小时的,但" 2012-03-02"上有一个突破,该频率从凌晨1点开始,而不是午夜:

>>> dfn['2012-03-01'].tail()
Out[12]: 
2012-03-01 19:00:00+00:00   2144039255.000
2012-03-01 20:00:00+00:00   4055718131.000
2012-03-01 21:00:00+00:00   1850226718.000
2012-03-01 22:00:00+00:00    738256967.900
2012-03-01 23:00:00+00:00   1163600574.000
Name: vol, dtype: float64
>>> dfn['2012-03-02'].head()
Out[11]: 
2012-03-02 01:00:00+00:00   2364896887.000
2012-03-02 02:00:00+00:00   1598799781.000
2012-03-02 03:00:00+00:00   2011619242.000
2012-03-02 04:00:00+00:00   2408284057.000
2012-03-02 05:00:00+00:00   2084405746.000
Name: vol, dtype: float64

我想从'2012-03-02'1 am的休息点开始将索引提高到1小时。我尝试了以下内容:

trouble_spots =  pd.date_range(start = dfn.index[trouble_loc], end = dfn.index[-1], freq='H', tz= 'Europe/London')
>>> trouble_spots
Out[13]: DatetimeIndex(['2012-03-02 01:00:00+00:00', '2012-03-02 02:00:00+00:00', '2012-03-02 03:00:00+00:00',.... '2012-03-02 22:00:00+00:00', '2012-03-02 23:00:00+00:00', '2012-03-03 00:00:00+00:00'], dtype='datetime64[ns]', freq='H', tz='Europe/London')

问题是,以下内容似乎不起作用:

dfn.index = dfn.index.map(lambda x: x - pd.Timedelta(1, 'h') if x in trouble_spots else x)

它给出了与以前相同的索引。零件单独工作:

>>> [x for x in dfn.index if x in trouble_spots]
Out[6]: 
[Timestamp('2012-03-02 01:00:00+0000', tz='Europe/London'),
 Timestamp('2012-03-02 02:00:00+0000', tz='Europe/London'),
 ......
 Timestamp('2012-03-02 03:00:00+0000', tz='Europe/London'),
 Timestamp('2012-03-02 21:00:00+0000', tz='Europe/London'),
 Timestamp('2012-03-02 22:00:00+0000', tz='Europe/London'),
dfn.index.map(lambda x: x - pd.Timedelta(1, 'h'))
Out[5]: 
DatetimeIndex(['2012-02-26 23:00:00+00:00', '2012-02-27 00:00:00+00:00', ... '2012-03-02 20:00:00+00:00', '2012-03-02 21:00:00+00:00', '2012-03-02 22:00:00+00:00', '2012-03-02 23:00:00+00:00'], dtype='datetime64[ns]', length=120, freq=None, tz='Europe/London')

但他们似乎在一起似乎不起作用。我这里有什么我缺少的吗?

我不确定如何修复您的代码那个索引。"但是,"我有价值观,我不想删除其中任何一个。"因此,为什么不删除一个索引值,然后构造一个全新的数据框,使值保持不变,而是使用新的索引 1个小时来说明您删除的1个丢失的一个。

示例:

import pandas as pd
import numpy as np
np.random.seed(1)
index = pd.PeriodIndex(start='2012-03-02 00:00:00+00:00', freq='h', periods=48)
values = np.random.randint(7382569, 40557181, len(index))
df = pd.DataFrame(data=values, index=index)
new_index = df.index.delete(df.index.get_loc('2012-03-03 00:00:00+00:00'))
index_plus_one = new_index.append(pd.Index([pd.Period(new_index.max() + pd.Timedelta('1h'))]))
new_df = pd.DataFrame(data=values, index=index_plus_one)
print('PREVIOUS DF')
print(df.iloc[18: 30])
print('NEW DF')
print(new_df.iloc[18: 30])

打印:

PREVIOUS DF
                         0
2012-03-02 18:00  39881435
2012-03-02 19:00  18381629
2012-03-02 20:00  29424423
2012-03-02 21:00  10704782
2012-03-02 22:00  31142331
2012-03-02 23:00  12152829
2012-03-03 00:00  13083060
2012-03-03 01:00  32950053
2012-03-03 02:00  20771859
2012-03-03 03:00  20330693
2012-03-03 04:00  24348102
2012-03-03 05:00  20971447
NEW DF
                         0
2012-03-02 18:00  39881435
2012-03-02 19:00  18381629
2012-03-02 20:00  29424423
2012-03-02 21:00  10704782
2012-03-02 22:00  31142331
2012-03-02 23:00  12152829
2012-03-03 01:00  13083060
2012-03-03 02:00  32950053
2012-03-03 03:00  20771859
2012-03-03 04:00  20330693
2012-03-03 05:00  24348102
2012-03-03 06:00  20971447

最新更新