我想知道是否有一种有效的方法可以在行子集下方和上方获取 X 行数。我在下面创建了一个基本实现,但我相信有更好的方法。我关心的子集是 buyindex,它是具有买入信号的行的索引。我想在 sellindex 的上方和下方获取几行,以验证我的算法是否正常工作。如何以有效的方式做到这一点?我的方式似乎迂回。
buyindex = list(data2[data2['buy'] == True].index)
print buyindex [71, 102, 103, 179, 505, 506, 607]
buyindex1 = map(lambda x: x + 1, buyindex)
buyindex2 = map(lambda x: x - 1, buyindex)
buyindex3 = map(lambda x: x - 2, buyindex)
buyindex4 = map(lambda x: x + 2, buyindex)
buyindex.extend(buyindex1)
buyindex.extend(buyindex2)
buyindex.extend(buyindex3)
buyindex.extend(buyindex4)
buyindex.sort()
data2.iloc[buyindex]
更新 - 这是数据的结构。我有"买入"的指数,但我基本上想在买入上方和下方获得几个指数。
VTI upper lower sell buy AboveUpper BelowLower date tokens_left
38 61.25 64.104107 61.341893 False True False True 2007-02-28 00:00:00 5
39 61.08 64.218341 61.109659 False True False True 2007-03-01 00:00:00 5
40 60.21 64.446719 60.640281 False True False True 2007-03-02 00:00:00 5
41 59.51 64.717936 60.050064 False True False True 2007-03-05 00:00:00 5
142 63.27 68.909776 64.310224 False True False True 2007-07-27 00:00:00 5
217 62.98 68.858308 63.587692 False True False True 2007-11-12 00:00:00 5
254 61.90 66.941126 61.944874 False True False True 2008-01-07 00:00:00 5
255 60.79 67.049925 61.312075 False True False True 2008-01-08 00:00:00 5
296 57.02 61.382677 57.371323 False True False True 2008-03-07 00:00:00 5
297 56.15 61.709166 56.788834 False True False True 2008-03-10 00:00:00 5
更新:我根据所选答案创建了一个通用函数。如果您认为这可以提高效率,请告诉我。
def get_test_index(df, column, numbers):
"""
builds an test index based on a range of numbers above and below the a specific index you want.
df = dataframe to build off of
column = the column that is important to you. for instance, 'buy', or 'sell'
numbers = how many above and below you want of the important index
"""
idx_l = list(df[df[column] == True].index)
for i in range(numbers)[1:]:
idxpos = data2[column].shift(i).fillna(False)
idxpos = list(df[idxpos].index)
idx_l.extend(idxpos)
idxneg = data2[column].shift(-i).fillna(False)
idxneg = list(df[idxneg].index)
idx_l.extend(idxneg)
#print idx_l
return sorted(idx_l)
这将是一个非常有效的方法
In [39]: df = DataFrame(np.random.randn(10,2))
In [41]: start=3
In [42]: stop=4
In [43]: df.iloc[(max(df.index.get_loc(start)-2,0)):min(df.index.get_loc(stop)+2,len(df))]
Out[43]:
0 1
1 0.348326 1.413770
2 1.898784 0.053780
3 0.825941 -1.986920
4 0.075956 -0.324657
5 -2.736800 -0.075813
[5 rows x 2 columns]
如果你想要一个任意索引器的功能,只需创建一个列表您想要并传递给.iloc
In [18]: index_wanted = [71, 102, 103, 179, 505, 506, 607]
In [19]: from itertools import chain
In [20]: df = DataFrame(np.random.randn(1000,2))
您可能想要独特的
f = lambda i: [ i-2, i-1, i, i+1, i+2 ]
In [21]: indexers = Index(list(chain(*[ f(i) for i in [71, 102, 103, 179, 505, 506, 607] ]))).unique()
In [22]: df.iloc[indexers]
Out[22]:
0 1
69 0.792996 0.264597
70 1.084315 -0.620006
71 -0.030432 1.219576
72 -0.767855 0.765041
73 -0.637771 -0.103378
100 -1.087505 1.698133
101 1.007143 2.594046
102 -0.307440 0.308360
103 0.944429 -0.411742
104 1.332445 -0.149350
105 0.165213 1.125668
177 0.409580 -0.375709
178 -1.757021 -0.266762
179 0.736809 -1.286848
180 1.856241 0.176931
181 -0.492590 0.083519
503 -0.651788 0.717922
504 -1.612517 -1.729867
505 -1.786807 -0.066421
506 1.423571 0.768161
507 0.186871 1.162447
508 1.233441 -0.028261
605 -0.060117 -1.459827
606 -0.541765 -0.350981
607 -1.166172 -0.026404
608 -0.045338 1.641864
609 -0.337748 0.955940
[27 rows x 2 columns]
您可以使用
shift
和|
运算符;例如,对于 +/- 2 天,您可以执行
idx = (data2['buy'] == True).fillna(False)
idx |= idx.shift(-1) | idx.shift(-2) # one & two days after
idx |= idx.shift(1) | idx.shift(2) # one & two days before
data2[ idx ] # this is what you need