使用正则表达式选择并替换记事本++中的多行

我有一个非常大的HTML文件，其中包含安全扫描的结果，我需要从文档中提取无用的信息。我需要提取的示例如下所示：

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

编辑后，上面的文本应该被删除。由于变化，我无法进行标准查找。下面是需要从文档中删除的内容的另一个示例：

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

我需要将 ID 号 10395 视为变量，但长度保持不变。此外，"Microsoft Windows SMB 共享枚举"也需要被视为变量，因为它在整个文档中都会发生变化。

我

尝试过用这样的东西来代替，但我认为我完全没有达到目标。

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11111" target="_blank"> 11111</a>

也许我应该完全使用不同的工具？

我假设通过多次重复1，你的意思是单个字符的占位符，但这是不对的。您要实现的是这样的目标：

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(d+)" target="_blank"> 1</a>

要匹配整整 6 行：

<tr>s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>s*<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(d+)" target="_blank"> 1</a>s*</td>s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>s*</tr>

然后，您可以将其替换为空字符串。

正则表达式按从最不复杂到更复杂的顺序排列，但它们都完成了工作：

<a.*>.*d.*</a>
<a.*>.*d{5}.*</a>
<a.*id=d{5}.*>.*d{5}.*</a>

免责声明：小心。我无法使用正则表达式解析 html。

相关内容

最新更新

热门标签：