import pandas as pd
import numpy as np
nan = np.NaN
data = [['a1',0,'Bottom_Class'],
['a1',0,nan],
['a1',1,nan],
['a1',1,nan],
['a1',1,nan],
['a1',1,'Top_Class'],
['a1',0,'Bottom_Class'],
['a1',0,'Top_Class'],
['a2',1,nan],
['a2',1,nan],
['a2',1,'Bottom_Class'],
['a2',0,nan],
['a2',0,'Bottom_Class'],
['a1',0,'Top_Class'],
['a2',1,nan],
['a1',1,'Top_Class'],
['a2',1,nan],
['a2',1,nan],
['a2',1,nan],
['a2',0,'Bottom_Class']]
df = pd.DataFrame(data,columns=['Id','State','Class'])
print(df)
Id State Class
0 a1 0 Bottom_Class
1 a1 0 NaN
2 a1 1 NaN
3 a1 1 NaN
4 a1 1 NaN
5 a1 1 Top_Class
6 a1 0 Bottom_Class
7 a1 0 Top_Class
8 a2 1 NaN
9 a2 1 NaN
10 a2 1 Bottom_Class
11 a2 0 NaN
12 a2 0 Bottom_Class
13 a1 0 Top_Class
14 a2 1 NaN
15 a1 1 Top_Class
16 a2 1 NaN
17 a2 1 NaN
18 a2 1 NaN
19 a2 0 Bottom_Class
所以这是一些股票市场价格的数据框架,但我修改它,以便更容易理解。
只是为了专注于df。类:
我的想法是设置:Bottom_Class作为起点,Top_Class作为终点。反之亦然。
并且 Top_Class 之后的值(不包括自身(将设置为0,直到它满足 Bottom_Class
和 值之后(不包括自身(Bottom_Class将设置为1,直到它满足Top_Class。
我希望像这样修改该系列:
Class
Bottom_Class
1
1
1
1
Top_Class
Bottom_Class
Top_Class
0
0
Bottom_Class
1
Bottom_Class
Top_Class
0
Top_Class
0
0
0
Bottom_Class
您可以使用np.where
并使用fillna
仅填充NaN
值
df.Class.fillna(pd.Series(np.where(df.Class.ffill() == 'Bottom_Class',1,0)))
# Output:
0 Bottom_Class
1 1
2 1
3 1
4 1
5 Top_Class
6 Bottom_Class
7 Top_Class
8 0
9 0
10 Bottom_Class
11 1
12 Bottom_Class
13 Top_Class
14 0
15 Top_Class
16 0
17 0
18 0
19 Bottom_Class