我需要删除具有以下特征的行。
<img src="index-1_2.jpg"/><br>
<img src="index-1_3.jpg"/><br>
<img src="index-1_5.jpg"/><br>
<img src="index-2_1.jpg"/><br>
<img src="index-2_5.jpg"/><br>
<img src="index-3_1.png"/><br>
<img src="index-23_8.png"/><br>
<img src="index-22_9.png"/><br>
<img src="index-22_1.jpg"/><br>
<img src="index-22_2.jpg"/><br>
<img src="index-99_5.png"/><br>
<img src="index-100_5.png"/><br>
<img src="index-1000_5.png"/><br>
...
如您所见,单词索引之后和_之后的数字以及图像格式(png、jpg)各不相同。
我需要生成一个正则表达式,删除索引之后的所有EXECPTING行数。例如,我需要保留只有数字1和2的行。
我有以下生成的正则表达式
^<img src="index-(?!2|1)d+_d+.(?:jpg|png)"/><br>$
但为了保留数字1和2,它还保留了数字22、23、100和1000,因为它们包含这些数字
使用
^<img src="index-(?![12]_)(d+)_d+.(?:jpg|png)"/><br>$
请参阅正则表达式证明。使用$1
作为替换。
解释
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
<img src="index- '<img src="index-'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[12] any character of: '1', '2'
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to 1:
--------------------------------------------------------------------------------
d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of 1
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
. '.'
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
jpg 'jpg'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
png 'png'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
><br> '><br>'
--------------------------------------------------------------------------------
$ before an optional n, and the end of the
string
您可以使用而不是使用负前瞻(?!2|1)
(?![12]_)
如果下一个字符是1或2后跟下划线,则阻止匹配。
通过在这里闲逛,我想我得到了你想要的:
import re
txt = "index-4_8.jpg"
txt2 = "index-1_8.png"
#Check if the string starts with "The":
x = re.search("^(index-2_|index-1_).+(.jpg|.png)$", txt)
if (x):
print('Matched')
else:
print('NotMatched')
x = re.search("^(index-2_|index-1_).+(.jpg|.png)$", txt2)
if (x):
print('Matched')
else:
print('NotMatched')