Xpath 获取省略子节点的主段落文本

我想匹配以下代码的主要段落内容，省略子节点p，div，h3。

<div class="content">
sunday, monday, tuesday,
<br>
<br>
wednesday, thursday,
<br>
friday, saturday
<div class ="tags">sunday</div>
<h3>Days</h3>
<p>....</p>
<div class="style">monday to friday</div>
</div>

我尝试了像//div[@class="content"]/*[not(self::p)]和//div[@class="content"]/*[not(name()="p")]这样的 Xpath，但没有一个有效。然后我尝试//div[@class="content"]/node()[not(div)]，//div[@class="content"]/node()[not(h3)]它只匹配第一个文本。

我需要下面的文字

sunday, monday, tuesday,
<br>
<br>
wednesday, thursday,
<br>
friday, saturday

通过省略子级div 类 ="标签"， h3， p，div 类 = 样式。

这应该可以解决问题：

//div[@class="content"]/*[not(self::p) and not(self::h3) and not(self::div)]|//div[@class="content"]/text()

演示

解释：

//div[@class="content"]选择相关节点
*[not(self::p) and not(self::h3) and not(self::div)]省略子元素：h3、p、div
(或者代替任何divand not(self::div[@class="style"]) and not(self::div[@class="tags"])]，如果你真的需要过滤div class ="tags" 和div class = style(。
|//div[@class="content"]/text()然后，用空白文本((连接

实际上，这有点复杂。也许您最好只选择文本或在节点上进行一些 DOM 操作。

相关内容

最新更新

热门标签：