我正在学习熊猫,我遇到了以下方法来比较数据帧中的行。
在这里,我使用np.were
和shift()
函数来比较列中的值。
import pandas as pd
import numpy as np
# Initialise data to Dicts of series.
d = {'col' : pd.Series([10, 30, 20, 40, 70, 60])}
# creates Dataframe.
df = pd.DataFrame(d)
df['Relation'] = np.where(df['col'] > df['col'].shift(), "Grater", "Less")
df
此处的输出如下所示:
col Relation
0 10 Less
1 30 Grater
2 20 Less
3 40 Grater
4 70 Grater
5 60 Less
我在第 3 行感到困惑,为什么它显示为Grater
?,40 小于 70,所以它应该显示为Less
.我在这里做错了什么?
因为将40
与20
进行比较,因为移位索引按1
:
df['Relation'] = np.where(df['col'] > df['col'].shift(), "Grater", "Less")
df['shifted'] = df['col'].shift()
df['m'] = df['col'] > df['col'].shift()
print (df)
col Relation shifted m
0 10 Less NaN False
1 30 Grater 10.0 True
2 20 Less 30.0 False
3 40 Grater 20.0 True <- here
4 70 Grater 40.0 True
5 60 Less 70.0 False
也许你想按-1
换档:
df['Relation'] = np.where(df['col'] > df['col'].shift(-1), "Grater", "Less")
df['shifted'] = df['col'].shift(-1)
df['m'] = df['col'] > df['col'].shift(-1)
print (df)
col Relation shifted m
0 10 Less 30.0 False
1 30 Grater 20.0 True
2 20 Less 40.0 False
3 40 Less 70.0 False
4 70 Grater 60.0 True
5 60 Less NaN False