Pandas DataFrame如何查询最接近的DateTime索引



如何查询熊猫数据框架的最接近索引?索引是dateTimeIndex

2016-11-13 20:00:10.617989120   7.0 132.0
2016-11-13 22:00:00.022737152   1.0 128.0
2016-11-13 22:00:28.417561344   1.0 132.0

我尝试了:

df.index.get_loc(df.index[0], method='nearest')

但它给了我InvalidIndexError: Reindexing only valid with uniquely valued Index objects

如果我尝试过同样的错误:

dt = datetime.datetime.strptime("2016-11-13 22:01:25", "%Y-%m-%d %H:%M:%S")
df.index.get_loc(dt, method='nearest')

但是,如果我删除 method='nearest',它有效,但这不是我想要的,我想从我的查询datetime

中找到最接近的索引

似乎您需要首先获得get_loc的位置,然后选择[]

dt = pd.to_datetime("2016-11-13 22:01:25.450")
print (dt)
2016-11-13 22:01:25.450000
print (df.index.get_loc(dt, method='nearest'))
2
idx = df.index[df.index.get_loc(dt, method='nearest')]
print (idx)
2016-11-13 22:00:28.417561344
#if need select row to Series use iloc
s = df.iloc[df.index.get_loc(dt, method='nearest')]
print (s)
b      1.0
c    132.0
Name: 2016-11-13 22:00:28.417561344, dtype: float64

DatetimeIndex.get_loc现在被弃用了DatetimeIndex.get_indexer ...

ts = pd.to_datetime('2022-05-26 13:19:48.154000')        # example time
iloc_idx = df.index.get_indexer([ts], method='nearest')  # returns absolute index into df e.g. array([5])
loc_idx = df.index[iloc_idx]                             # if you want named index
my_val = df.iloc[iloc_idx]
my_val = df.loc[loc_idx]                                 # as above so below...

我相信jezrael解决方案有效,但在我的数据框架上不行(我不知道为什么)。这是我想到的解决方案。

from bisect import bisect #operate as sorted container
timestamps = np.array(df.index)
upper_index = bisect(timestamps, np_dt64, hi=len(timestamps)-1) #find the upper index of the closest time stamp
df_index = df.index.get_loc(min(timestamps[upper_index], timestamps[upper_index-1],key=lambda x: abs(x - np_dt64))) #find the closest between upper and lower timestamp

我知道这是一个古老的问题,但是在寻找与布莱恩·福克(Bryan Fok)相同的问题时,我降落在这里。因此,对于将来的搜索者来到这里,我发布了解决方案。我的索引有4个非唯一项目(可能是由于记录数据时的四舍五入错误)。以下有效并显示了正确的数据:

dt = pd.to_datetime("2016-11-13 22:01:25.450")
s = df.loc[df.index.unique()[df.index.unique().get_loc(dt, method='nearest')]]

但是,如果您最近的索引多次出现,这将返回多行。如果您想抓住它,则可以对其进行测试:

if len(s) != len(df.columns):
    # do what is appropriate for your case
    # e.g. selecting only the first occurence
    s.iloc[0]

编辑:修复了一些测试后的捕获

最新更新