VB.net Html转换器错误没有改变吗?



net语言。我想替换html第一标签,并保持文本的结构,我已经尝试了下面的代码从网站https://beansoftware.com/ASP.NET-Tutorials/Convert-HTML-To-Plain-Text.aspx

Dim html As String = "<div class='WordSection1'><p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim OldWords() As String = {" ", "&", """, "<", ">", "®", "©", "•", "™"}
Dim NewWords() As String = {" ", "&", """", "<", ">", "®", "©", "•", "™"}
For i As Integer = 0 To i < OldWords.Length
sbhtml.Replace(OldWords(i), NewWords(i))
Next i
Console.WriteLine($"result after loop : {sbhtml}")
sbhtml.Replace("<br>", "n<br>")
sbhtml.Replace("<br ", "n<br ")
sbhtml.Replace("<p ", "n<p ")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)

但是返回的结果和字符串

一样

for语句错误。应该是

For i As Integer = 0 To OldWords.Length - 1

可能是c#语法泄露了。

为什么不把sbhtml.Replace("<br>", "n<br>")和后面的行附加到OldWordsNewWords?它们在技术上没有任何不同。

通过使用元组,您可以将新旧单词放入同一个数组中,并使用for - each循环

我建议以下方法

Dim html As String =
"&lt;div class='WordSection1'&gt;aaa<br>bbb&lt;p class='MsoNormal'&gt;"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim Substitutions() As (old As String, repl As String) = {
("&nbsp;", " "), ("&amp;", "&"), ("&quot;", """"), ("&lt;", "<"),
("&gt;", ">"), ("&reg;", "®"), ("&copy;", "©"), ("&bull;", "•"),
("&trade;", "â„¢"), ("<br>", "n<br>"), ("<br ", "n<br "), ("<p ", "n<p ")}
For Each subst In Substitutions
sbhtml.Replace(subst.old, subst.repl)
Next
Console.WriteLine($"result after loop : {sbhtml}")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)

HtmlAgilityPack在HTML操作方面做得很好,非常可靠。你可以这样写

Dim plainText As String = HtmlUtilities.ConvertToPlainText(html)

参见使用NuGet包管理器在Visual Studio中安装和管理包,以轻松安装html lagilitypack。

最新更新