我有一个像这样的数据框df
| Time | variable one |
| -----------| -------------|
| 2022-11-09 | 0 |
| 2022-11-10 | 0 |
| 2022-11-11 | 2 |
| 2022-11-12 | 7 |
| 2022-11-13 | 0 |
| 2022-11-14 | 5 |
| 2022-11-15 | 3 |
| 2022-11-16 | 0 |
| 2022-11-17 | 0 |
我需要删除第一个非零元素之前的所有零,并删除最后一个非零元素之后的所有零。非零元素之间的零应保持为零。
我解决了两个while循环:
i=0
while df.loc[i,'variable one']==0:
df.loc[i,'variable one'] = np.nan
i=i+1
i=len(df['variable one'])-1
while df.loc[i,'variable one']==0:
df.loc[i,'variable one'] = np.nan
i=i-1
这段代码可以工作,但是当处理数百列和数千行时,它变得非常慢。我正在寻找一个优化,甚至删除while循环。
您可以通过将cummax
按正向和反向顺序组合使用布尔掩码来进行布尔索引:
m = df['variable one'].ne(0)
df.loc[~(m.cummax()&m[::-1].cummax()), 'variable one'] = np.nan
# or
# df['variable one'] = df['variable one'].where(m.cummax()&m[::-1].cummax())
与cummin
等效:
m = df['variable one'].eq(0)
df.loc[(m.cummin()|m[::-1].cummin()), 'variable one'] = np.nan
输出:
Time variable one
0 2022-11-09 NaN
1 2022-11-10 NaN
2 2022-11-11 2.0
3 2022-11-12 7.0
4 2022-11-13 0.0
5 2022-11-14 5.0
6 2022-11-15 3.0
7 2022-11-16 NaN
8 2022-11-16 NaN
中间体:
Time variable one m cummax rev_cummax & ~
0 2022-11-09 0 False False True False True
1 2022-11-10 0 False False True False True
2 2022-11-11 2 True True True True False
3 2022-11-12 7 True True True True False
4 2022-11-13 0 False True True True False
5 2022-11-14 5 True True True True False
6 2022-11-15 3 True True True True False
7 2022-11-16 0 False True False False True
8 2022-11-16 0 False True False False True