如何删除基于正则表达式的行(有例外)



我需要删除具有以下特征的行。

<img src="index-1_2.jpg"/><br>
<img src="index-1_3.jpg"/><br>
<img src="index-1_5.jpg"/><br>
<img src="index-2_1.jpg"/><br>
<img src="index-2_5.jpg"/><br>
<img src="index-3_1.png"/><br>
<img src="index-23_8.png"/><br>
<img src="index-22_9.png"/><br>
<img src="index-22_1.jpg"/><br>
<img src="index-22_2.jpg"/><br>
<img src="index-99_5.png"/><br>
<img src="index-100_5.png"/><br>
<img src="index-1000_5.png"/><br>
...

如您所见,单词索引之后和_之后的数字以及图像格式(png、jpg)各不相同。

我需要生成一个正则表达式,删除索引之后的所有EXECPTING行数。例如,我需要保留只有数字1和2的行。

我有以下生成的正则表达式

^<img src="index-(?!2|1)d+_d+.(?:jpg|png)"/><br>$

但为了保留数字1和2,它还保留了数字22、23、100和1000,因为它们包含这些数字

使用

^<img src="index-(?![12]_)(d+)_d+.(?:jpg|png)"/><br>$

请参阅正则表达式证明。使用$1作为替换。

解释

--------------------------------------------------------------------------------
^                        the beginning of the string
--------------------------------------------------------------------------------
<img src="index-         '<img src="index-'
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
[12]                     any character of: '1', '2'
--------------------------------------------------------------------------------
_                        '_'
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
(                        group and capture to 1:
--------------------------------------------------------------------------------
d+                      digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
)                        end of 1
--------------------------------------------------------------------------------
_                        '_'
--------------------------------------------------------------------------------
d+                      digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
.                       '.'
--------------------------------------------------------------------------------
(?:                      group, but do not capture:
--------------------------------------------------------------------------------
jpg                      'jpg'
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
png                      'png'
--------------------------------------------------------------------------------
)                        end of grouping
--------------------------------------------------------------------------------
"                        '"'
--------------------------------------------------------------------------------
/                       '/'
--------------------------------------------------------------------------------
><br>                    '><br>'
--------------------------------------------------------------------------------
$                        before an optional n, and the end of the
string

您可以使用而不是使用负前瞻(?!2|1)

(?![12]_)

如果下一个字符是1或2后跟下划线,则阻止匹配。

通过在这里闲逛,我想我得到了你想要的:

import re
txt = "index-4_8.jpg"
txt2 = "index-1_8.png"
#Check if the string starts with "The":
x = re.search("^(index-2_|index-1_).+(.jpg|.png)$", txt)
if (x):
print('Matched')
else:
print('NotMatched')
x = re.search("^(index-2_|index-1_).+(.jpg|.png)$", txt2)
if (x):
print('Matched')
else:
print('NotMatched')

最新更新