我有一个pandas数据框架,其中包含一个半小时的时间序列索引和一系列日常数据,我需要根据一个方程的日期来匹配这些数据。下面的代码在循环中使用.get()工作,但速度很慢,看起来相当"非python"。"
我试过把这个系列变成一个虚拟列的数据框架,以便尝试合并或查找,但由于各种原因,我不能让它工作。缺少数据,因此一些潜在的方法可能出现关键错误。
先前回答的问题似乎不适用。熟悉lambda函数或.asfreq方法的人可能会想出一些办法。
import pandas as pd
import numpy as np
# Make a 2 day series
days = 2
dates = pd.date_range('20130102',periods=days)
ts_d = pd.Series(np.random.randn(days),index=dates)
ts_d
# Output
2013-01-02 -1.044139
2013-01-03 -1.061720
Freq: D, dtype: float64
# Make an overlapping 4 day dataframe with 60min index
datetimes = pd.date_range('20130101 00:00',periods=4*24, freq = '60min')
df_t = pd.DataFrame(np.random.randn(4*24,4),index=datetimes,columns=list('ABCD'))
# Begin clunkiness
df_t['date'] = df_t.index.date
for t in df_t.index:
d = df_t.loc[t, 'date']
df_t.loc[t, 'E'] = ts_d.get(d)
df_t
一些输出: A B C D date E
2013-01-01 20:00:00 -0.173764 -1.440833 -0.163796 0.479593 2013-01-01 None
2013-01-01 21:00:00 1.915522 2.308827 -0.849182 -1.478981 2013-01-01 None
2013-01-01 22:00:00 -0.013391 -1.534994 -2.365495 0.747692 2013-01-01 None
2013-01-01 23:00:00 0.739665 -0.566568 0.413195 0.665017 2013-01-01 None
2013-01-02 00:00:00 -0.358202 -1.625681 0.120250 -1.122430 2013-01-02 -1.044139
2013-01-02 01:00:00 1.048837 -0.328021 0.933473 -0.234328 2013-01-02 -1.044139
2013-01-02 02:00:00 1.178195 -1.389543 -0.144850 -2.430063 2013-01-02 -1.044139
2013-01-02 03:00:00 -0.420962 0.244130 1.819005 -0.982521 2013-01-02 -1.044139
.
.
.
2013-01-02 15:00:00 1.809403 -2.505042 -0.509833 -1.238630 2013-01-02 -1.044139
2013-01-02 16:00:00 0.740123 -0.205582 0.795701 0.459017 2013-01-02 -1.044139
2013-01-02 17:00:00 1.252692 1.025432 -0.235781 -0.506460 2013-01-02 -1.044139
2013-01-02 18:00:00 -1.456726 -1.983843 -1.623061 0.629214 2013-01-02 -1.044139
2013-01-02 19:00:00 1.126687 -0.253415 0.163900 0.059876 2013-01-02 -1.044139
2013-01-02 20:00:00 0.156657 0.066207 0.103946 -0.762910 2013-01-02 -1.044139
2013-01-02 21:00:00 -1.123818 0.314226 -0.281381 0.947381 2013-01-02 -1.044139
2013-01-02 22:00:00 -0.945620 0.538180 1.403452 -0.065406 2013-01-02 -1.044139
2013-01-02 23:00:00 0.059012 2.599817 -0.623826 0.796559 2013-01-02 -1.044139
2013-01-03 00:00:00 0.859748 1.476591 0.607554 -1.575007 2013-01-03 -1.06172
2013-01-03 01:00:00 0.678326 0.084930 0.762786 -1.139595 2013-01-03 -1.06172
2013-01-03 02:00:00 -0.034952 -1.224600 0.317359 -1.620755 2013-01-03 -1.06172
2013-01-03 03:00:00 -1.208597 -1.864493 -0.883250 -0.814249 2013-01-03 -1.06172
2013-01-03 04:00:00 -0.061918 0.461941 0.163563 0.532755 2013-01-03 -1.06172
.
.
.
你可以这样做:
首先,获取日期字段:
df_t['Date'] = pd.to_datetime(df_t.index.date)
set as index:
df_t = df_t.reset_index().set_index('Date')
设置价格:
df_t['E'] = ts_d
重置旧索引:
df_t = df_t.reset_index().set_index('index')
验证:
df_t.ix[pd.to_datetime('20130102')]
*edit:加入jeff的建议
您至少可以通过在创建df_t
后进行分组来节省一些时间:
df_t.loc[:, 'E'] = None
for k, group in pd.groupby(df_t, df_t.index.date):
df_t.E[group.index] = ts_d.get(k)
由于ts_d
相对较短,因此预期组的数量非常少,我认为这是相当有效的