我有一个数据框
>>> dfn.head()
Out[8]:
2012-02-27 00:00:00+00:00 3054679365.000
2012-02-27 01:00:00+00:00 1433475236.000
2012-02-27 02:00:00+00:00 1725293108.000
2012-02-27 03:00:00+00:00 1089842336.000
2012-02-27 04:00:00+00:00 1637301178.000
>>> dfn.tail()
2012-03-02 20:00:00+00:00 3696373423.000
2012-03-02 21:00:00+00:00 3423657296.000
2012-03-02 22:00:00+00:00 1887346076.000
2012-03-02 23:00:00+00:00 426382220.400
2012-03-03 00:00:00+00:00 759307738.400
dtype: float64
频率是小时的,但" 2012-03-02"上有一个突破,该频率从凌晨1点开始,而不是午夜:
>>> dfn['2012-03-01'].tail()
Out[12]:
2012-03-01 19:00:00+00:00 2144039255.000
2012-03-01 20:00:00+00:00 4055718131.000
2012-03-01 21:00:00+00:00 1850226718.000
2012-03-01 22:00:00+00:00 738256967.900
2012-03-01 23:00:00+00:00 1163600574.000
Name: vol, dtype: float64
>>> dfn['2012-03-02'].head()
Out[11]:
2012-03-02 01:00:00+00:00 2364896887.000
2012-03-02 02:00:00+00:00 1598799781.000
2012-03-02 03:00:00+00:00 2011619242.000
2012-03-02 04:00:00+00:00 2408284057.000
2012-03-02 05:00:00+00:00 2084405746.000
Name: vol, dtype: float64
我想从'2012-03-02'1 am的休息点开始将索引提高到1小时。我尝试了以下内容:
trouble_spots = pd.date_range(start = dfn.index[trouble_loc], end = dfn.index[-1], freq='H', tz= 'Europe/London')
>>> trouble_spots
Out[13]: DatetimeIndex(['2012-03-02 01:00:00+00:00', '2012-03-02 02:00:00+00:00', '2012-03-02 03:00:00+00:00',.... '2012-03-02 22:00:00+00:00', '2012-03-02 23:00:00+00:00', '2012-03-03 00:00:00+00:00'], dtype='datetime64[ns]', freq='H', tz='Europe/London')
问题是,以下内容似乎不起作用:
dfn.index = dfn.index.map(lambda x: x - pd.Timedelta(1, 'h') if x in trouble_spots else x)
它给出了与以前相同的索引。零件单独工作:
>>> [x for x in dfn.index if x in trouble_spots]
Out[6]:
[Timestamp('2012-03-02 01:00:00+0000', tz='Europe/London'),
Timestamp('2012-03-02 02:00:00+0000', tz='Europe/London'),
......
Timestamp('2012-03-02 03:00:00+0000', tz='Europe/London'),
Timestamp('2012-03-02 21:00:00+0000', tz='Europe/London'),
Timestamp('2012-03-02 22:00:00+0000', tz='Europe/London'),
dfn.index.map(lambda x: x - pd.Timedelta(1, 'h'))
Out[5]:
DatetimeIndex(['2012-02-26 23:00:00+00:00', '2012-02-27 00:00:00+00:00', ... '2012-03-02 20:00:00+00:00', '2012-03-02 21:00:00+00:00', '2012-03-02 22:00:00+00:00', '2012-03-02 23:00:00+00:00'], dtype='datetime64[ns]', length=120, freq=None, tz='Europe/London')
但他们似乎在一起似乎不起作用。我这里有什么我缺少的吗?
我不确定如何修复您的代码那个索引。"但是,"我有价值观,我不想删除其中任何一个。"因此,为什么不删除一个索引值,然后构造一个全新的数据框,使值保持不变,而是使用新的索引 1个小时来说明您删除的1个丢失的一个。
示例:
import pandas as pd
import numpy as np
np.random.seed(1)
index = pd.PeriodIndex(start='2012-03-02 00:00:00+00:00', freq='h', periods=48)
values = np.random.randint(7382569, 40557181, len(index))
df = pd.DataFrame(data=values, index=index)
new_index = df.index.delete(df.index.get_loc('2012-03-03 00:00:00+00:00'))
index_plus_one = new_index.append(pd.Index([pd.Period(new_index.max() + pd.Timedelta('1h'))]))
new_df = pd.DataFrame(data=values, index=index_plus_one)
print('PREVIOUS DF')
print(df.iloc[18: 30])
print('NEW DF')
print(new_df.iloc[18: 30])
打印:
PREVIOUS DF
0
2012-03-02 18:00 39881435
2012-03-02 19:00 18381629
2012-03-02 20:00 29424423
2012-03-02 21:00 10704782
2012-03-02 22:00 31142331
2012-03-02 23:00 12152829
2012-03-03 00:00 13083060
2012-03-03 01:00 32950053
2012-03-03 02:00 20771859
2012-03-03 03:00 20330693
2012-03-03 04:00 24348102
2012-03-03 05:00 20971447
NEW DF
0
2012-03-02 18:00 39881435
2012-03-02 19:00 18381629
2012-03-02 20:00 29424423
2012-03-02 21:00 10704782
2012-03-02 22:00 31142331
2012-03-02 23:00 12152829
2012-03-03 01:00 13083060
2012-03-03 02:00 32950053
2012-03-03 03:00 20771859
2012-03-03 04:00 20330693
2012-03-03 05:00 24348102
2012-03-03 06:00 20971447