Python Pandas传播值直到条件



DataFrame

我有这个DataFrame(见图(

如果绿色列(dataframe.Signal(是1,那么在我们的示例1088.50中,我有等于某个值的红色列(dataframe.Level(,否则dataframe.Level将是Nan

现在我想传播红色值(dataframe.Level(,只要它低于或等于棕色值,(dataframe.Brown(

dataframe.Brown低于红色值时,dataframe.Level必须等于最后一次的红色值,然后它必须返回nan,就像照片中一样。

我怎么能做这样的事我尝试对lops使用normal,但这个数据帧包含600万行,速度太慢

TL;DR

left = df.loc[df["Signal"] == 1].index
right = df.loc[df["Brown"] - df["Level"].ffill() < 0]  
.groupby(df["Signal"].cumsum(), as_index=False) 
.nth(0).index
subsets = pd.cut(df.index, pd.IntervalIndex.from_arrays(left, right, closed="both"))
df = df.groupby(subsets).ffill()

输入数据:

Brown  Signal    Level
0   1089.00       0      NaN
1   1089.00       0      NaN
2   1088.50       0      NaN
3   1088.50       0      NaN
4   1088.50       1  1088.50  # 1088.50 <= start here (4)
5   1088.50       0      NaN  # 1088.50
6   1088.50       0      NaN  # 1088.50
7   1088.50       0      NaN  # 1088.50
8   1088.50       0      NaN  # 1088.50
9   1088.75       0      NaN  # 1088.50
10  1088.50       0      NaN  # 1088.50
11  1088.25       0      NaN  # 1088.50 <= stop here (11)
12  1088.50       0      NaN
13  1088.25       1  1088.25  # 1088.25 <= start here (13)
14  1088.50       0      NaN  # 1088.25
15  1088.50       0      NaN  # 1088.25
16  1088.00       0      NaN  # 1088.25 <= stop here (16)
17  1088.50       0      NaN
18  1088.75       0      NaN
19  1088.50       0      NaN
20  1088.25       0      NaN
21  1088.50       0      NaN
  1. 查找开始索引(左间隔(<=[4,13]
>>> left = df.loc[df["Signal"] == 1].index
>>> left
Int64Index([4, 13], dtype='int64')
  1. 查找停止索引(右间隔(<=[11,16]
>>> right = df.loc[df["Brown"] - df["Level"].ffill() < 0]  
.groupby(df["Signal"].cumsum(), as_index=False) 
.nth(0).index
>>> right
Int64Index([11, 16], dtype='int64')
  1. 提取所有子集<=[(4,11(,(13,16(]
>>> subsets = pd.cut(df.index, pd.IntervalIndex.from_arrays(left, right, closed="both"))
>>> list(subsets)
[nan,
nan,
nan,
nan,
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
nan,
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
nan,
nan,
nan,
nan,
nan]
  1. 沿每个间隔填充Level值:
>>> df.groupby(subsets).ffill()
Brown  Signal    Level
0   1089.00       0      NaN
1   1089.00       0      NaN
2   1088.50       0      NaN
3   1088.50       0      NaN
4   1088.50       1  1088.50
5   1088.50       0  1088.50
6   1088.50       0  1088.50
7   1088.50       0  1088.50
8   1088.50       0  1088.50
9   1088.75       0  1088.50
10  1088.50       0  1088.50
11  1088.25       0  1088.50
12  1088.50       0      NaN
13  1088.25       1  1088.25
14  1088.50       0  1088.25
15  1088.50       0  1088.25
16  1088.00       0  1088.25
17  1088.50       0      NaN
18  1088.75       0      NaN
19  1088.50       0      NaN
20  1088.25       0      NaN
21  1087.75       0      NaN

最新更新