net语言。我想替换html第一标签,并保持文本的结构,我已经尝试了下面的代码从网站https://beansoftware.com/ASP.NET-Tutorials/Convert-HTML-To-Plain-Text.aspx
Dim html As String = "<div class='WordSection1'><p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim OldWords() As String = {" ", "&", """, "<", ">", "®", "©", "•", "™"}
Dim NewWords() As String = {" ", "&", """", "<", ">", "®", "©", "•", "™"}
For i As Integer = 0 To i < OldWords.Length
sbhtml.Replace(OldWords(i), NewWords(i))
Next i
Console.WriteLine($"result after loop : {sbhtml}")
sbhtml.Replace("<br>", "n<br>")
sbhtml.Replace("<br ", "n<br ")
sbhtml.Replace("<p ", "n<p ")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
但是返回的结果和字符串
一样for语句错误。应该是
For i As Integer = 0 To OldWords.Length - 1
可能是c#语法泄露了。
为什么不把sbhtml.Replace("<br>", "n<br>")
和后面的行附加到OldWords
和NewWords
?它们在技术上没有任何不同。
通过使用元组,您可以将新旧单词放入同一个数组中,并使用for - each循环
我建议以下方法
Dim html As String =
"<div class='WordSection1'>aaa<br>bbb<p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim Substitutions() As (old As String, repl As String) = {
(" ", " "), ("&", "&"), (""", """"), ("<", "<"),
(">", ">"), ("®", "®"), ("©", "©"), ("•", "•"),
("™", "â„¢"), ("<br>", "n<br>"), ("<br ", "n<br "), ("<p ", "n<p ")}
For Each subst In Substitutions
sbhtml.Replace(subst.old, subst.repl)
Next
Console.WriteLine($"result after loop : {sbhtml}")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
HtmlAgilityPack在HTML操作方面做得很好,非常可靠。你可以这样写
Dim plainText As String = HtmlUtilities.ConvertToPlainText(html)
参见使用NuGet包管理器在Visual Studio中安装和管理包,以轻松安装html lagilitypack。