如何将数据帧的每一行与接下来的 2 行进行比较，并根据这 3 行和算法修改当前行?(熊猫)

如何将每行的"价格"值与接下来的 2 行进行比较？我想为每一行运行一个函数：如果当前价格在接下来的 2 小时内的任何时间较低，我想将">最低价"分配给当前行的"操作"列。如果当前价格高于接下来的 2 小时，则分配">最高价"。如果当前价格不是所有 3 小时比较中最高或最低的价格，请分配"保持"。

那么，我怎样才能从每一行中获取价格，并将其与熊猫的以下 2 行进行比较呢？数据帧如下所示：

data.head()
    Date        Time    Price   Month   Hour    Action  
0   2018-01-01  0       2633    January 1       NaN 
1   2018-01-01  1       2643    January 2       NaN 
2   2018-01-01  2       2610    January 3       NaN 
3   2018-01-01  3       2470    January 4       NaN 
4   2018-01-01  4       2474    January 5       NaN

在这种情况下，所需的输出如下所示：

data.head()
    Date        Time    Price   Month   Hour    Action  
0   2018-01-01  0       2633    January 1       Hold
1   2018-01-01  1       2643    January 2       High
2   2018-01-01  2       2610    January 3       High    
3   2018-01-01  3       2470    January 4       Low 
4   2018-01-01  4       2474    January 5       Hold

谢谢。

编辑：可能可以很容易地完成 for 循环，但我相信熊猫有一些更好的方法来做到这一点

您可以使用函数 data['Price'].shift(-1) 获取当前行中的下一个价格，data['Price'].shift(-2)在当前行中提前 2 个周期获取价格。接下来，您可以使用切片选择接下来两行高于或低于当前价格的行，并用所需的值填充它。

请参阅下面的操作方法：

# Check if the current price is lower than the next 2 rows and assign to the column 'Action' the value 'Low' if this is true
data.loc[(data['Price'].shift(-2)> data['Price']) & (data['Price'].shift(-1) > data['Price']), 'Action'] = 'Low'
# Check if the current price is higher than the next 2 rows and assign to the column 'Action' the value 'High' if this is true
data.loc[(data['Price'].shift(-2)< data['Price']) & (data['Price'].shift(-1) < data['Price']), 'Action'] = 'High'
# fill the rest of the rows with the value Hold
data['Action'] = data['Action'].fillna('Hold')

我们可以为此编写一些条件。并根据这些条件选择值 np.select .在我们的条件下，我们使用.shift将当前行与接下来的两行进行比较。

注意最后两行将返回Unknown因为我们没有两天的数据进行比较。这是有道理的。

# Print the extended dataframe which is used
print(df)
         Date  Time  Price    Month  Hour  Action
0  2018-01-01     0   2633  January     1     NaN
1  2018-01-01     1   2643  January     2     NaN
2  2018-01-01     2   2610  January     3     NaN
3  2018-01-01     3   2470  January     4     NaN
4  2018-01-01     4   2474  January     5     NaN
5  2018-01-01     5   2475  January     6     NaN
6  2018-01-01     6   2471  January     7     NaN

定义条件、选择并应用np.select

conditions = [
    (df['Price'] > df['Price'].shift(-1)) & (df['Price'] > df['Price'].shift(-2)),
    ((df['Price'].between(df['Price'].shift(-1), df['Price'].shift(-2))) | (df['Price'].between(df['Price'].shift(-2), df['Price'].shift(-1)))),
    (df['Price'] < df['Price'].shift(-1)) & (df['Price'] < df['Price'].shift(-2)),
]
choices = ['High', 'Hold', 'Low']
df['Action'] = np.select(conditions, choices, default='Unknown')
print(df)
         Date  Time  Price    Month  Hour   Action
0  2018-01-01     0   2633  January     1     Hold
1  2018-01-01     1   2643  January     2     High
2  2018-01-01     2   2610  January     3     High
3  2018-01-01     3   2470  January     4      Low
4  2018-01-01     4   2474  January     5     Hold
5  2018-01-01     5   2475  January     6  Unknown
6  2018-01-01     6   2471  January     7  Unknown

我从创建源数据帧开始，比你的头：

df = pd.DataFrame(data=[[ '2018-01-01', 0, 2633, 'January', 1 ],
    [ '2018-01-01', 1, 2643, 'January', 2 ], [ '2018-01-01', 2, 2610, 'January', 3 ],
    [ '2018-01-01', 3, 2470, 'January', 4 ], [ '2018-01-01', 4, 2474, 'January', 5 ],
    [ '2018-01-01', 5, 2475, 'January', 6 ]],
    columns=['Date', 'Time', 'Price', 'Month', 'Hour']); df

第一步是计算 2 个辅助列，P1价格从下一个小时开始，提前 2 小时P2价格：

df['P1'] = df.Price.diff(-1).fillna(0, downcast='infer')
df['P2'] = df.Price.diff(-2).fillna(0, downcast='infer')

然后我们需要一个函数应用于每一行：

def fn(row):
    if row.P1 < 0 and row.P2 < 0:
        return 'Low'
    elif row.P1 > 0 and row.P2 > 0:
        return 'High'
    else:
        return 'Hold'

最后一步是计算新列(应用上述函数(并删除辅助列：

df['Action'] = df.apply(fn, axis=1)
df.drop(['P1', 'P2'], axis=1, inplace=True)

相关内容

最新更新

热门标签：