如何从df中删除连续的数字



我想删除在df中包含连续数字并在末尾包含0的行,例如12340或45670。我尝试了这个代码,但没有改变

rows_to_remove = []
for i, row in df.iterrows():
digits = [int(d) for d in str(row['prix'])]
if all([digits[j+1]-digits[j] == 1 for j in range(len(digits)-1)]) and digits[-1] == 0:
rows_to_remove.append(i)
df = df.drop(rows_to_remove)

可以使用列表推导式:

consecutive = '123456789'
m = [not (s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
out = df[m]

输出:

prix
1  12378
2  12345

工作原理:

consecutive = '123456789'
df['keep'] = [not(s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
print(df)

输出:

prix   keep
0  123450  False
1   12378   True
2   12345   True
3   45670  False

可再生的输入:

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670]})

2位数字

如果您想保留2位数字,如20:

consecutive = '123456789'
m1 = np.array([not(s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)])
m2 = df['prix'].lt(100)
out = df[m1|m2]

或:

m = [not (s.endswith('0') and len(s) > 2 and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
out = df[m]

输出:

prix
1  12378
2  12345
4     20

输入:使用

df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670, 20]})

如果我理解正确,您想过滤掉以'0'结尾的字符串化整数(int是10的倍数),并且当剥离尾部0时,长度为2+并且是'123456789'的子字符串?

如果是这样,我相信这是可行的:

from pandas import DataFrame, Series
# Some test data
df = DataFrame({
'i': [
1,
10,
11,
12,
120, # Dropped
122,
123,
1230, # Dropped
12300, # Dropped
1240,
13,
130,
134,
1340,
2,
20,
21,
210,
2120,
23,
230, # Dropped
2340, # Dropped
234000, # Dropped
2350,
123456789,
1234567890 # Dropped
]})
filt = Series(s.endswith('0') and len(s.rstrip('0')) > 1 and s.rstrip('0') in '123456789' for s in df['i'].astype(str))
filtered_df = df.loc[~filt]

您可以将逻辑拆分以使其更具可读性,并将字符串过滤器与&操作符一起使用:

stringified = df['i'].astype(str)
filt_1 = Series(s.endswith('0') for s in stringified)
filt_2 = Series(len(s.rstrip('0')) > 1 for s in stringified)
filt_3 = Series(s.rstrip('0') in '123456789' for s in stringified)
filtered_df = df.loc[~(filt_1 & filt_2 & filt_3)]

(也可能有方法使过滤更有效)

最新更新