查找除子节之外的所有后代 text() 节点

我的XML文档有任意嵌套的部分。给定对特定部分的引用，我需要找到该部分中的所有TextNode，不包括小节。

例如，给定对下面#a1节点的引用，我只需要找到"A1"和"A1"文本节点：

<root>
  <section id="a1">
    <b>A1 <c>A1</c></b>
    <b>A1 <c>A1</c></b>
    <section id="a1.1">
      <b>A1.1 <c>A1.1</c></b>
    </section>
    <section id="a1.2">
      <b>A1.2 <c>A1.2</c></b>
      <section id="a1.2.1">
        <b>A1.2.1</b>
      </section>
      <b>A1.2 <c>A1.2</c></b>
    </section>
  </section>
  <section id="a2">
    <b>A2 <c>A2</c></b>
  </section>
</root>

如果不是很明显，以上是虚构的数据。特别是id属性可能不存在于实际文档中。

我现在想出的

最好的方法是找到该部分中的所有文本节点，然后使用 Ruby 减去我不想要的文本节点：

def own_text(node)
  node.xpath('.//text()') - node.xpath('.//section//text()')
end
doc = Nokogiri.XML(mydoc,&:noblanks)
p own_text(doc.at("#a1")).length #=> 4

是否可以创建单个 XPath 1.0 表达式来直接查找这些节点？像这样：

.//text()[ancestor::section = self] # self being the original context node

使用（对于id属性字符串值为"a1"的部分）：

   //section[@id='a1']
       //*[normalize-space(text()) and ancestor::section[1]/@id = 'a1']/text()

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:template match="/">
     <xsl:copy-of select=
      "//section[@id='a1']
           //*[normalize-space(text()) and ancestor::section[1]/@id = 'a1']
     "/>
 </xsl:template>
</xsl:stylesheet>

在提供的 XML 文档上应用此转换时：

<root>
    <section id="a1">
        <b>A1 
            <c>A1</c>
        </b>
        <b>A1 
            <c>A1</c>
        </b>
        <section id="a1.1">
            <b>A1.1 
                <c>A1.1</c>
            </b>
        </section>
        <section id="a1.2">
            <b>A1.2 
                <c>A1.2</c>
            </b>
            <section id="a1.2.1">
                <b>A1.2.1</b>
            </section>
            <b>A1.2 
                <c>A1.2</c>
            </b>
        </section>
    </section>
    <section id="a2">
        <b>A2 
            <c>A2</c>
        </b>
    </section>
</root>

它计算 XPath 表达式（仅选择所需文本节点的父级 - 以便获得清晰可见的结果）并将所选节点复制到输出中：

<b>A1 
            <c>A1</c>
</b>
<c>A1</c>
<b>A1 
            <c>A1</c>
</b>
<c>A1</c>

更新：如果section元素可以具有相同的id属性（或根本没有id属性），请使用：

       (//section)[1]
           //*[normalize-space(text())
           and
              count(ancestor::section)
             =
               count((//section)[1]/ancestor::section) +1]/text()

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
     <xsl:template match="/">
         <xsl:copy-of select=
          "(//section)[1]
               //*[normalize-space(text())
               and
                  count(ancestor::section)
                 =
                   count((//section)[1]/ancestor::section) +1]
         "/>
     </xsl:template>
</xsl:stylesheet>

转换结果（相同）：

<b>A1 
            <c>A1</c>
</b>
<c>A1</c>
<b>A1 
            <c>A1</c>
</b>
<c>A1</c>

这将选择完全相同的所需文本节点。

使用：

//text()[ancestor::section[1]/@id = 'a1']

相关内容

最新更新

热门标签：