如何使用 htmlagility 获取标记及其文本


HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
string html = null;
html =
"<body> " +
"<p class="hang12">“What is Lorem Ipsum?” <i>Lorem Ipsum is simply dummy text</i> Lorem Ipsum has been the</p>" +
"<p class="hang12">when an unknown printer took a galley of type <i>It has survived not only five centuries,</i>.</p>" +
"<p class="hang12">but also the  <i>remaining essentially </i> </p>" +
"<p class="hang12">with the release of Letraset sheets containing Lorem Ipsum passages, <i>and more recently with desktop</i>. 1944.</p>" +
"</body>";
doc.LoadHtml(html);
foreach (var item in doc.DocumentNode.Descendants())
{
chNodes(item);
}
public void chNodes(HtmlAgilityPack.HtmlNode node)
{
try
{
if (node.HasChildNodes)
{
foreach (var item in node.ChildNodes)
{
chNodes(item);
}
}
else
{
Console.WriteLine("************");
Console.WriteLine(node.Line);
Console.WriteLine(node.LinePosition);
Console.WriteLine("************");
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace);
throw ex;
}
}

我上面的代码得到了找到的开始标签的第一个位置。 但是我无法获得结束标签的位置。 我该如何解决?我需要这些值来突出显示 Web 浏览器控件中的文本。谢谢。

您可以使用以下代码尝试此操作

foreach (var item in doc.DocumentNode.SelectNodes("//p[@class='hang12']"))
{ 
item.innerText;
item.innerHtml; 
}

相关内容

最新更新