将 HTML 从定义的起点解析为定义的终点

我有一些HTML：

<hr noshade>
<p><a href="#1">Some text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is some description</span></p>
<hr noshade> <!-- so <hr noshade> is the delimiter for me -->
<p><a href="#2">Some more text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is description for some more text</span></p>
<hr noshade>

在使用 nokogiri 解析时，我想在这些标签集中的每一组之间打印信息，这些标签由我自己的分隔符<hr noshade>分隔。因此，第一个块应该打印位于两个hr noshade标签之间的所有"p"标签之间的信息，依此类推。

我在XPath上使用接受的答案选择两个特定元素之间的所有元素

我只有一个半工厂解决方案

您可以使用以下 XPath 表达式：

.//hr[1][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=1]

对于 <hr noshade> 1 和 2 之间的第一组，

然后

.//hr[2][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=2]

对于 2 和 3 <hr noshade>之间的元素，等等。

这些表达式选择的内容：

<hr noshade>的所有同级，由其位置 N 指定
只有 N <hr noshade>以前的兄弟姐妹，即位于第 N 组中
而且本身并不<hr noshade>

由于它将在 2 <hr noshade> 之间选择多个元素，因此您可能需要循环结果并提取每个同级元素的数据。

有人使用更通用的解决方案吗？

相关内容

最新更新

热门标签：