使用白名单删除属性

我需要从带有标记的字符串中删除属性。

这是C#代码：

strContent = Regex.Replace(strContent, @"<(w+)[^>]*(?<=( ?/?))>", "<$1$2>", 
RegexOptions.IgnoreCase);

例如，此代码将取代

This is some <div id="div1" class="cls1">content</div>. This is some more <span 
id="span1" class="cls1">content</span>. This is <input type="readonly" id="input1" 
value="further content"></input>.

带有

This is some <div>content</div>. This is some more <span>content</span>. This is 
<input></input>.

但在删除属性时，我需要一个"白名单"。在上面的例子中，我希望不能删除"input"标记属性。所以我希望输出为：

This is some <div>content</div>. This is some more <span>content</span>. This is 
<input type="readonly" id="input1" value="further content"></input>.

感谢你在这方面的帮助。

对于您的示例，您可以使用：

(<(?!input)[^s>]+)[^>]*(>)

替换为$1$2。

不过，我不确定你打算如何指定白名单。如果您可以对其进行硬编码，那么您可以很容易地向上面添加更多的(?!whitelistTag)，这也可以通过编程从数组中很容易地完成。

RegExr 工作

针对常见的You should not parse HTML with regex，您可以将问题重新表述为：

This is a "quoted string", cull each "quoted string to its" first word unless the "string starts with" the word "string, like these last two".

你会说regex不应该用来解决这个问题吗？因为这是完全相同的问题。当然，HTML解析器可以用于该工作，但它几乎不会使使用regex进行相同操作的想法无效。

相关内容

最新更新

热门标签：