Pandas查找半小时数据帧索引的每日序列值



我有一个pandas数据框架,其中包含一个半小时的时间序列索引和一系列日常数据,我需要根据一个方程的日期来匹配这些数据。下面的代码在循环中使用.get()工作,但速度很慢,看起来相当"非python"。"

我试过把这个系列变成一个虚拟列的数据框架,以便尝试合并或查找,但由于各种原因,我不能让它工作。缺少数据,因此一些潜在的方法可能出现关键错误。

先前回答的问题似乎不适用。熟悉lambda函数或.asfreq方法的人可能会想出一些办法。

import pandas as pd
import numpy as np
# Make a 2 day series
days = 2
dates = pd.date_range('20130102',periods=days)
ts_d = pd.Series(np.random.randn(days),index=dates)
ts_d
# Output
2013-01-02   -1.044139
2013-01-03   -1.061720
Freq: D, dtype: float64
# Make an overlapping 4 day dataframe with 60min index
datetimes = pd.date_range('20130101 00:00',periods=4*24, freq = '60min')
df_t = pd.DataFrame(np.random.randn(4*24,4),index=datetimes,columns=list('ABCD'))
# Begin clunkiness
df_t['date'] = df_t.index.date
for t in df_t.index:
    d = df_t.loc[t, 'date']
    df_t.loc[t, 'E'] = ts_d.get(d)
df_t
一些输出:

                         A          B           C           D          date          E
2013-01-01 20:00:00 -0.173764   -1.440833   -0.163796    0.479593    2013-01-01  None
2013-01-01 21:00:00  1.915522    2.308827   -0.849182   -1.478981    2013-01-01  None
2013-01-01 22:00:00 -0.013391   -1.534994   -2.365495    0.747692    2013-01-01  None
2013-01-01 23:00:00  0.739665   -0.566568    0.413195    0.665017    2013-01-01  None
2013-01-02 00:00:00 -0.358202   -1.625681    0.120250   -1.122430    2013-01-02 -1.044139
2013-01-02 01:00:00  1.048837   -0.328021    0.933473   -0.234328    2013-01-02 -1.044139
2013-01-02 02:00:00  1.178195   -1.389543   -0.144850   -2.430063    2013-01-02 -1.044139
2013-01-02 03:00:00 -0.420962    0.244130    1.819005   -0.982521    2013-01-02 -1.044139
.
.
.
2013-01-02 15:00:00  1.809403   -2.505042   -0.509833   -1.238630    2013-01-02 -1.044139
2013-01-02 16:00:00  0.740123   -0.205582    0.795701    0.459017    2013-01-02 -1.044139
2013-01-02 17:00:00  1.252692    1.025432   -0.235781   -0.506460    2013-01-02 -1.044139
2013-01-02 18:00:00 -1.456726   -1.983843   -1.623061    0.629214    2013-01-02 -1.044139
2013-01-02 19:00:00  1.126687   -0.253415    0.163900    0.059876    2013-01-02 -1.044139
2013-01-02 20:00:00  0.156657    0.066207    0.103946   -0.762910    2013-01-02 -1.044139
2013-01-02 21:00:00 -1.123818    0.314226   -0.281381    0.947381    2013-01-02 -1.044139
2013-01-02 22:00:00 -0.945620    0.538180    1.403452   -0.065406    2013-01-02 -1.044139
2013-01-02 23:00:00  0.059012    2.599817   -0.623826    0.796559    2013-01-02 -1.044139
2013-01-03 00:00:00  0.859748    1.476591    0.607554   -1.575007    2013-01-03  -1.06172
2013-01-03 01:00:00  0.678326    0.084930    0.762786   -1.139595    2013-01-03  -1.06172
2013-01-03 02:00:00 -0.034952   -1.224600    0.317359   -1.620755    2013-01-03  -1.06172
2013-01-03 03:00:00 -1.208597   -1.864493   -0.883250   -0.814249    2013-01-03  -1.06172
2013-01-03 04:00:00 -0.061918    0.461941    0.163563    0.532755    2013-01-03  -1.06172
.
.
.

你可以这样做:

首先,获取日期字段:

df_t['Date'] = pd.to_datetime(df_t.index.date)

set as index:

df_t = df_t.reset_index().set_index('Date')

设置价格:

df_t['E'] = ts_d

重置旧索引:

df_t = df_t.reset_index().set_index('index')

验证:

df_t.ix[pd.to_datetime('20130102')]

*edit:加入jeff的建议

您至少可以通过在创建df_t后进行分组来节省一些时间:

df_t.loc[:, 'E'] = None
for k, group in pd.groupby(df_t, df_t.index.date):
    df_t.E[group.index] = ts_d.get(k)

由于ts_d相对较短,因此预期组的数量非常少,我认为这是相当有效的

最新更新