本质上,我想将 True 的连续重复项转换为 False,正如标题所暗示的那样。
例如,假设我有一个 0 和 1 的数组
x = pd.Series([1,0,0,1,1])
应该变成:
y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.
这也适用于两个以上的连续,假设我有一个更长的数组: 例如。
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
成为;
y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])
我搜索过的帖子大多是删除连续的重复项,并且不保留原始长度。在这种情况下,它应保留原始长度。
它类似于以下代码:
for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False
但这给了我永无止境的奔跑。并且不容纳两个以上的连续
。Pandas 解决方案 - 创建Series
,然后按shift
和cumsum
连续分组,并按Series.duplicated
过滤重复项中的最后1
值:
s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
编辑:
对于多列,使用函数:
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 0
13 1 1
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 1 1
香草蟒蛇:
x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)
指纹:
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]