将连续的 Trues 转换为 False python



本质上,我想将 True 的连续重复项转换为 False,正如标题所暗示的那样。

例如,假设我有一个 0 和 1 的数组

x = pd.Series([1,0,0,1,1])

应该变成:

y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.

这也适用于两个以上的连续,假设我有一个更长的数组: 例如。

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])

成为;

y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])

我搜索过的帖子大多是删除连续的重复项,并且不保留原始长度。在这种情况下,它应保留原始长度。

它类似于以下代码:

for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False

但这给了我永无止境的奔跑。并且不容纳两个以上的连续

Pandas 解决方案 - 创建Series,然后按shiftcumsum连续分组,并按Series.duplicated过滤重复项中的最后1值:

s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]

编辑:

对于多列,使用函数:

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a  b
0   0  0
1   0  0
2   0  0
3   0  0
4   0  0
5   1  1
6   0  0
7   0  0
8   1  1
9   0  0
10  0  0
11  0  0
12  0  0
13  1  1
14  0  0
15  0  0
16  0  0
17  0  0
18  0  0
19  0  0
20  1  1

香草蟒蛇:

x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)

指纹:

[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]

最新更新