使用周期索引切片熊猫系列



我有几个pandas series,其 PeriodIndex的频率不同。我想根据另一个PeriodIndex过滤这些频率原则未知(在下面的示例中直接指定为selectionAselectionB,但实际上是从另一个series中剥离的(。

我找到了3种方法,每种方法都有自己的缺点,如下示例所示。有更好的方法吗?

import numpy as np
import pandas as pd
y = pd.Series(np.random.random(4),  index=pd.period_range('2018', '2021', freq='A'), name='speed')
q = pd.Series(np.random.random(16), index=pd.period_range('2018Q1', '2021Q4', freq='Q'), name='speed')
m = pd.Series(np.random.random(48), index=pd.period_range('2018-01', '2021-12', freq='M'), name='speed')
selectionA = pd.period_range('2018Q3', '2020Q2', freq='Q') #subset of y, q, and m
selectionB = pd.period_range('2014Q3', '2015Q2', freq='Q') #not subset of y, q, and m
#Comparing some options: 
#1: filter method
#2: slicing
#3: selection based on boolean comparison
#1: problem when frequencies unequal: always returns empty series
yA_1 = y.filter(selectionA, axis=0) #Fail: empty series
qA_1 = q.filter(selectionA, axis=0) 
mA_1 = m.filter(selectionA, axis=0) #Fail: empty series
yB_1 = y.filter(selectionB, axis=0) 
qB_1 = q.filter(selectionB, axis=0) 
mB_1 = m.filter(selectionB, axis=0)
#2: problem when frequencies unequal: wrong selection and error instead of empty result
yA_2 = y[selectionA[0]:selectionA[-1]]  
qA_2 = q[selectionA[0]:selectionA[-1]] 
mA_2 = m[selectionA[0]:selectionA[-1]] #Fail: selects 22 months instead of 24
yB_2 = y[selectionB[0]:selectionB[-1]] #Fail: error
qB_2 = q[selectionB[0]:selectionB[-1]] 
mB_2 = m[selectionB[0]:selectionB[-1]] #Fail: error
#3: works, but very verbose
yA_3 =y[(y.index >= selectionA[0].start_time) & (y.index <= selectionA[-1].end_time)]
qA_3 =q[(q.index >= selectionA[0].start_time) & (q.index <= selectionA[-1].end_time)]
mA_3 =m[(m.index >= selectionA[0].start_time) & (m.index <= selectionA[-1].end_time)]
yB_3 =y[(y.index >= selectionB[0].start_time) & (y.index <= selectionB[-1].end_time)]
qB_3 =q[(q.index >= selectionB[0].start_time) & (q.index <= selectionB[-1].end_time)]
mB_3 =m[(m.index >= selectionB[0].start_time) & (m.index <= selectionB[-1].end_time)]

非常感谢

我已经通过将start_timeend_time添加到切片范围来解决:

yA_2fixed = y[selectionA[0].start_time: selectionA[-1].end_time]
qA_2fixed = q[selectionA[0].start_time: selectionA[-1].end_time] 
mA_2fixed = m[selectionA[0].start_time: selectionA[-1].end_time] #now has 24 rows
yB_2fixed = y[selectionB[0].start_time: selectionB[-1].end_time] #doesn't fail; returns empty series
qB_2fixed = q[selectionB[0].start_time: selectionB[-1].end_time] 
mB_2fixed = m[selectionB[0].start_time: selectionB[-1].end_time] #doesn't fail; returns empty series

但是,如果有一种更简洁的方法来写这篇文章,那么我仍然全神贯注。我特别想知道是否有可能以PeriodIndex更为"本地"的方式进行此此过滤,即,不要首先使用start_timeend_time属性将其转换为datetime实例。

最新更新