如何将每行的"价格"值与接下来的 2 行进行比较?我想为每一行运行一个函数:如果当前价格在接下来的 2 小时内的任何时间较低,我想将">最低价"分配给当前行的"操作"列。如果当前价格高于接下来的 2 小时,则分配">最高价"。如果当前价格不是所有 3 小时比较中最高或最低的价格,请分配"保持"。
那么,我怎样才能从每一行中获取价格,并将其与熊猫的以下 2 行进行比较呢?数据帧如下所示:
data.head()
Date Time Price Month Hour Action
0 2018-01-01 0 2633 January 1 NaN
1 2018-01-01 1 2643 January 2 NaN
2 2018-01-01 2 2610 January 3 NaN
3 2018-01-01 3 2470 January 4 NaN
4 2018-01-01 4 2474 January 5 NaN
在这种情况下,所需的输出如下所示:
data.head()
Date Time Price Month Hour Action
0 2018-01-01 0 2633 January 1 Hold
1 2018-01-01 1 2643 January 2 High
2 2018-01-01 2 2610 January 3 High
3 2018-01-01 3 2470 January 4 Low
4 2018-01-01 4 2474 January 5 Hold
谢谢。
编辑:可能可以很容易地完成 for 循环,但我相信熊猫有一些更好的方法来做到这一点
您可以使用函数 data['Price'].shift(-1)
获取当前行中的下一个价格,data['Price'].shift(-2)
在当前行中提前 2 个周期获取价格。接下来,您可以使用切片选择接下来两行高于或低于当前价格的行,并用所需的值填充它。
请参阅下面的操作方法:
# Check if the current price is lower than the next 2 rows and assign to the column 'Action' the value 'Low' if this is true
data.loc[(data['Price'].shift(-2)> data['Price']) & (data['Price'].shift(-1) > data['Price']), 'Action'] = 'Low'
# Check if the current price is higher than the next 2 rows and assign to the column 'Action' the value 'High' if this is true
data.loc[(data['Price'].shift(-2)< data['Price']) & (data['Price'].shift(-1) < data['Price']), 'Action'] = 'High'
# fill the rest of the rows with the value Hold
data['Action'] = data['Action'].fillna('Hold')
我们可以为此编写一些条件。并根据这些条件选择值 np.select
.在我们的条件下,我们使用.shift
将当前行与接下来的两行进行比较。
注意 最后两行将返回Unknown
因为我们没有两天的数据进行比较。这是有道理的。
# Print the extended dataframe which is used
print(df)
Date Time Price Month Hour Action
0 2018-01-01 0 2633 January 1 NaN
1 2018-01-01 1 2643 January 2 NaN
2 2018-01-01 2 2610 January 3 NaN
3 2018-01-01 3 2470 January 4 NaN
4 2018-01-01 4 2474 January 5 NaN
5 2018-01-01 5 2475 January 6 NaN
6 2018-01-01 6 2471 January 7 NaN
定义条件、选择并应用np.select
conditions = [
(df['Price'] > df['Price'].shift(-1)) & (df['Price'] > df['Price'].shift(-2)),
((df['Price'].between(df['Price'].shift(-1), df['Price'].shift(-2))) | (df['Price'].between(df['Price'].shift(-2), df['Price'].shift(-1)))),
(df['Price'] < df['Price'].shift(-1)) & (df['Price'] < df['Price'].shift(-2)),
]
choices = ['High', 'Hold', 'Low']
df['Action'] = np.select(conditions, choices, default='Unknown')
print(df)
Date Time Price Month Hour Action
0 2018-01-01 0 2633 January 1 Hold
1 2018-01-01 1 2643 January 2 High
2 2018-01-01 2 2610 January 3 High
3 2018-01-01 3 2470 January 4 Low
4 2018-01-01 4 2474 January 5 Hold
5 2018-01-01 5 2475 January 6 Unknown
6 2018-01-01 6 2471 January 7 Unknown
我从创建源数据帧开始,比你的头:
df = pd.DataFrame(data=[[ '2018-01-01', 0, 2633, 'January', 1 ],
[ '2018-01-01', 1, 2643, 'January', 2 ], [ '2018-01-01', 2, 2610, 'January', 3 ],
[ '2018-01-01', 3, 2470, 'January', 4 ], [ '2018-01-01', 4, 2474, 'January', 5 ],
[ '2018-01-01', 5, 2475, 'January', 6 ]],
columns=['Date', 'Time', 'Price', 'Month', 'Hour']); df
第一步是计算 2 个辅助列,P1
价格从下一个小时开始,提前 2 小时P2
价格:
df['P1'] = df.Price.diff(-1).fillna(0, downcast='infer')
df['P2'] = df.Price.diff(-2).fillna(0, downcast='infer')
然后我们需要一个函数应用于每一行:
def fn(row):
if row.P1 < 0 and row.P2 < 0:
return 'Low'
elif row.P1 > 0 and row.P2 > 0:
return 'High'
else:
return 'Hold'
最后一步是计算新列(应用上述函数(并删除辅助列:
df['Action'] = df.apply(fn, axis=1)
df.drop(['P1', 'P2'], axis=1, inplace=True)