我有生成二进制信号的时间序列,如下所示:
date value
1/4/1987 0
1/5/1987 1
1/6/1987 1
1/7/1987 0
1/9/1987 0
1/10/1987 1
1/12/1987 1
2/1/1987 1
2/2/1987 1
2/3/1987 1
2/4/1987 1
2/6/1987 1
2/7/1987 1
2/9/1987 0
2/10/1987 0
2/11/1987 0
2/12/1987 1
3/2/1987 0
3/3/1987 1
3/4/1987 1
3/6/1987 1
3/8/1987 1
3/9/1987 1
3/11/1987 1
3/12/1987 0
我正试图找出减少它们数量的方法,这样1个观测值之间就会有10个固定的间隔。
date new_value
1/4/1987 0
1/5/1987 1
1/6/1987 0
1/7/1987 0
1/9/1987 0
1/10/1987 0
1/12/1987 0
2/1/1987 0
2/2/1987 0
2/3/1987 0
2/4/1987 0
2/6/1987 0
2/7/1987 1
2/9/1987 0
2/10/198 0
2/11/198 0
2/12/198 0
3/2/1987 0
3/3/1987 0
3/4/1987 0
3/6/1987 0
3/8/1987 0
3/9/1987 0
3/11/1987 1
3/12/1987 0
非常感谢您的帮助。
提出一种矢量化的方法来实现这一点似乎很棘手。对于这些情况,如果我们仍然想要一种高性能的方法,numba
可能是一个不错的选择。以下是如何使用其高效的@njit编译模式:
from numba import njit
import numpy as np
@njit
def spacing_between_1(a, k):
x = np.zeros(len(a), np.int8)
first_one = np.argmax(a)
x[first_one] = 1
c=0
for i in range(first_one+1, len(x)):
if a[i] == 1 and c >= k:
x[i] = 1
c=0
continue
c +=1
return x
对于共享示例,我们得到:
a = df.value.to_numpy()
df['new_value'] = spacing_between_1(a, 10)
print(df)
value new_value
date
1/4/1987 0 0
1/5/1987 1 1
1/6/1987 1 0
1/7/1987 0 0
1/9/1987 0 0
1/10/1987 1 0
1/12/1987 1 0
2/1/1987 1 0
2/2/1987 1 0
2/3/1987 1 0
2/4/1987 1 0
2/6/1987 1 0
2/7/1987 1 1
2/9/1987 0 0
2/10/1987 0 0
2/11/1987 0 0
2/12/1987 1 0
3/2/1987 0 0
3/3/1987 1 0
3/4/1987 1 0
3/6/1987 1 0
3/8/1987 1 0
3/9/1987 1 0
3/11/1987 1 1
3/12/1987 0 0