我有以下数据帧:
time alarm
0 0
1 1
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 1
10 0
列alarm
表示报警。如果它响起,则取值为1
每次闹钟响起,我都想"静音";接下来的两行。然后,如果它在静音期后再次响起,我想静音接下来的两行,以此类推
换句话说,我想获得以下数据帧:
time alarm silenced
0 0 no
1 1 no
2 0 yes
3 1 yes
4 1 no
5 1 yes
6 1 yes
7 0 no
8 0 no
9 1 no
10 0 yes
我设法使用for循环或lambda函数来实现,但我必须加快计算速度
有人能帮我吗?提前谢谢!
p.S。由于我最终将删除";静音";行,也将接受直接删除此类行的解决方案。在这种情况下,结果应该是:
time alarm
0 0
1 1
4 1
7 0
8 0
9 1
在辅助功能中使用for循环的MY ATTEMPT
import numpy as np
import pandas as pd
df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df
def fun_silence(df):
# bool: if True, we are in a "silent" period
# if False, we can consider the current time as a possible alarm
flag_silent = False
# time of the *last* alarm
alarm_time = np.nan
# loop over rows
for index, row in df.iterrows():
# if we are in a silent period
if flag_silent:
# if 2 time steps passed from the last alarm, we end the silent period
if row['time'] - alarm_time > 2:
flag_silent = False
# otherwise, we mark this row as silenced
else:
df.at[index, 'silenced'] = 1
# if there is an alarm and we are not in a silent period
if row['alarm'] == 1 and not flag_silent:
# save the alarm time
alarm_time = row['time']
# enter in a silent period
flag_silent = True
return df
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced
我认为你无法避免这个问题中的for循环,但你当然可以优化函数,然后使用numba编译它,以在大型数据集上实现类似C的速度
from numba import njit
@njit
def silence(alarm):
count = 0
for a in alarm:
if count > 0:
yield True
count -= 1
elif count == 0 and a == 1:
count = 2
yield False
else:
yield False
df['silenced'] = [*silence(df['alarm'].to_numpy())]
time alarm silenced
0 0 0 False
1 1 1 False
2 2 0 True
3 3 1 True
4 4 1 False
5 5 1 True
6 6 1 True
7 7 0 False
8 8 0 False
9 9 1 False
10 10 0 True