DataFrame
我有这个DataFrame(见图(
如果绿色列(dataframe.Signal
(是1,那么在我们的示例1088.50中,我有等于某个值的红色列(dataframe.Level
(,否则dataframe.Level
将是Nan
现在我想传播红色值(dataframe.Level
(,只要它低于或等于棕色值,(dataframe.Brown
(
当dataframe.Brown
低于红色值时,dataframe.Level
必须等于最后一次的红色值,然后它必须返回nan,就像照片中一样。
我怎么能做这样的事我尝试对lops使用normal,但这个数据帧包含600万行,速度太慢
TL;DR
left = df.loc[df["Signal"] == 1].index
right = df.loc[df["Brown"] - df["Level"].ffill() < 0]
.groupby(df["Signal"].cumsum(), as_index=False)
.nth(0).index
subsets = pd.cut(df.index, pd.IntervalIndex.from_arrays(left, right, closed="both"))
df = df.groupby(subsets).ffill()
输入数据:
Brown Signal Level
0 1089.00 0 NaN
1 1089.00 0 NaN
2 1088.50 0 NaN
3 1088.50 0 NaN
4 1088.50 1 1088.50 # 1088.50 <= start here (4)
5 1088.50 0 NaN # 1088.50
6 1088.50 0 NaN # 1088.50
7 1088.50 0 NaN # 1088.50
8 1088.50 0 NaN # 1088.50
9 1088.75 0 NaN # 1088.50
10 1088.50 0 NaN # 1088.50
11 1088.25 0 NaN # 1088.50 <= stop here (11)
12 1088.50 0 NaN
13 1088.25 1 1088.25 # 1088.25 <= start here (13)
14 1088.50 0 NaN # 1088.25
15 1088.50 0 NaN # 1088.25
16 1088.00 0 NaN # 1088.25 <= stop here (16)
17 1088.50 0 NaN
18 1088.75 0 NaN
19 1088.50 0 NaN
20 1088.25 0 NaN
21 1088.50 0 NaN
- 查找开始索引(左间隔(<=[4,13]
>>> left = df.loc[df["Signal"] == 1].index
>>> left
Int64Index([4, 13], dtype='int64')
- 查找停止索引(右间隔(<=[11,16]
>>> right = df.loc[df["Brown"] - df["Level"].ffill() < 0]
.groupby(df["Signal"].cumsum(), as_index=False)
.nth(0).index
>>> right
Int64Index([11, 16], dtype='int64')
- 提取所有子集<=[(4,11(,(13,16(]
>>> subsets = pd.cut(df.index, pd.IntervalIndex.from_arrays(left, right, closed="both"))
>>> list(subsets)
[nan,
nan,
nan,
nan,
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
Interval(4.0, 11.0, closed='both'),
nan,
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
Interval(13.0, 16.0, closed='both'),
nan,
nan,
nan,
nan,
nan]
- 沿每个间隔填充
Level
值:
>>> df.groupby(subsets).ffill()
Brown Signal Level
0 1089.00 0 NaN
1 1089.00 0 NaN
2 1088.50 0 NaN
3 1088.50 0 NaN
4 1088.50 1 1088.50
5 1088.50 0 1088.50
6 1088.50 0 1088.50
7 1088.50 0 1088.50
8 1088.50 0 1088.50
9 1088.75 0 1088.50
10 1088.50 0 1088.50
11 1088.25 0 1088.50
12 1088.50 0 NaN
13 1088.25 1 1088.25
14 1088.50 0 1088.25
15 1088.50 0 1088.25
16 1088.00 0 1088.25
17 1088.50 0 NaN
18 1088.75 0 NaN
19 1088.50 0 NaN
20 1088.25 0 NaN
21 1087.75 0 NaN