Mysql regexp tag HTML



我在数据库中的Message_HTML字段中保存了一个HTML代码

Dear CUSTOMER, <br />
<br />
Please be advised that the following document has been moved:<br />
Document number: D4D4D4D4D4D<br />
<br />
<table border="1">
<th>Data</th>
<th>Movimento</th>
<th>Documento</th>
<tr>
<td>22/07/2021 15:35</td>
<td>Juntada de contrarrazões</td>
<td><a href="ver.aspx">ALERT - REPRESENTATIONS</a></td>
</tr>
<tr>
<td>22/07/2021 15:38</td>
<td>Juntada de certidão</td>
<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>
</tr>
<tr>
<td>22/07/2021 15:39</td>
<td>Juntada de alvará</td>
<td><a href="ver.aspx">NOTICE - PERMIT</a></td>
</tr>
</table>
<br />
<br />
If you are no longer interested in receiving the push, access the link:: <a href="push.aspx">Exit</a><br />
<br />
<b>ATTENTION: this email is generated in an automated way, please do not reply.</b>

我需要检查它在表列中是否有"证书"一词

<td><a href="ver.aspx">CERTIFICATE</a></td>

MYSQL 中使用的正则表达式

SELECT REGEXP_INSTR('<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>', '>[^<td><a*]*CERTIFICATE*[</a></td>]') AS verify;
REGEXP_INSTR(k.Message_html, '>[^<td><a*]*CERTIFICATE*[</a></td>]')

找不到记录

SELECT
*
FROM table as k
Where
WHERE REGEXP_INSTR(k.Message_html, concat('>[^<td><a*]*','CERTIFICATE,'*[</a></td>]')) > 0;

regexp在表中找不到单词

因为有一个单词CERTIFICATE

单词表的内容

word_id word
1 SUBJECT
2 DECISION
3 ORDER
4 SENTENCE
5 PETITION
6 CERTIFICATE
7 AMENDMENT TO THE INITIAL PETITION
8 NOTIFICATION - NOTIFICATION
9 EXTRACT
10 PETITION - PETITION
11 NOTIFICATION
12 MANIFESTATION
13 OTHER PARTS
14 REPRESENTATIONS

如果只是测试字符串是否与正则表达式匹配,则不需要使用REGEXP_INSTR()。使用RLIKE

SELECT *
FROM YourTable AS k
WHERE k.Message_html RLIKE '<td><a [^<]*CERTIFICATE'

[^<]*将匹配任何不是另一个标记开头的内容,因此如果<a>在其文本中包含CERTIFICATE,则它将匹配。

加入单词表:

SELECT * 
FROM Table1 AS k 
JOIN words p ON k.Message_html RLIKE CONCAT('<td><a [^<]*', p.word);

演示

这将找到您的证书

SELECT REGEXP_INSTR('<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>'
, '(<td><a href="ver.aspx">).*CERTIFICATE.*(</a></td>)') AS verify;
SELECT REGEXP_INSTR('Dear CUSTOMER, <br />
<br />
Please be advised that the following document has been moved:<br />
Document number: D4D4D4D4D4D<br />
<br />
<table border="1">
<th>Data</th>
<th>Movimento</th>
<th>Documento</th>
<tr>
<td>22/07/2021 15:35</td>
<td>Juntada de contrarrazões</td>
<td><a href="ver.aspx">ALERT - REPRESENTATIONS</a></td>
</tr>
<tr>
<td>22/07/2021 15:38</td>
<td>Juntada de certidão</td>
<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>
</tr>
<tr>
<td>22/07/2021 15:39</td>
<td>Juntada de alvará</td>
<td><a href="ver.aspx">NOTICE - PERMIT</a></td>
</tr>
</table>
<br />
<br />
If you are no longer interested in receiving the push, access the link:: <a href="push.aspx">Exit</a><br />
<br />
<b>ATTENTION: this email is generated in an automated way, please do not reply.</b>', '(<td><a href="ver.aspx">).*CERTIFICATE.*(</a></td>)') AS verify;

最新更新