我想删除在df中包含连续数字并在末尾包含0的行,例如12340或45670。我尝试了这个代码,但没有改变
rows_to_remove = []
for i, row in df.iterrows():
digits = [int(d) for d in str(row['prix'])]
if all([digits[j+1]-digits[j] == 1 for j in range(len(digits)-1)]) and digits[-1] == 0:
rows_to_remove.append(i)
df = df.drop(rows_to_remove)
可以使用列表推导式:
consecutive = '123456789'
m = [not (s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
out = df[m]
输出:
prix
1 12378
2 12345
工作原理:
consecutive = '123456789'
df['keep'] = [not(s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
print(df)
输出:
prix keep
0 123450 False
1 12378 True
2 12345 True
3 45670 False
可再生的输入:
df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670]})
2位数字
如果您想保留2位数字,如20
:
consecutive = '123456789'
m1 = np.array([not(s.endswith('0') and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)])
m2 = df['prix'].lt(100)
out = df[m1|m2]
或:
m = [not (s.endswith('0') and len(s) > 2 and s.rstrip('0') in consecutive)
for s in df['prix'].astype(str)]
out = df[m]
输出:
prix
1 12378
2 12345
4 20
输入:使用
df = pd.DataFrame({'prix': [123450, 12378, 12345, 45670, 20]})
如果我理解正确,您想过滤掉以'0'
结尾的字符串化整数(int是10的倍数),并且当剥离尾部0时,长度为2+并且是'123456789'
的子字符串?
如果是这样,我相信这是可行的:
from pandas import DataFrame, Series
# Some test data
df = DataFrame({
'i': [
1,
10,
11,
12,
120, # Dropped
122,
123,
1230, # Dropped
12300, # Dropped
1240,
13,
130,
134,
1340,
2,
20,
21,
210,
2120,
23,
230, # Dropped
2340, # Dropped
234000, # Dropped
2350,
123456789,
1234567890 # Dropped
]})
filt = Series(s.endswith('0') and len(s.rstrip('0')) > 1 and s.rstrip('0') in '123456789' for s in df['i'].astype(str))
filtered_df = df.loc[~filt]
您可以将逻辑拆分以使其更具可读性,并将字符串过滤器与&
操作符一起使用:
stringified = df['i'].astype(str)
filt_1 = Series(s.endswith('0') for s in stringified)
filt_2 = Series(len(s.rstrip('0')) > 1 for s in stringified)
filt_3 = Series(s.rstrip('0') in '123456789' for s in stringified)
filtered_df = df.loc[~(filt_1 & filt_2 & filt_3)]
(也可能有方法使过滤更有效)