使用 XPath 1.0 和 XSLT 1.0,我需要选择混合内容的直接父级或仅选择文本。请考虑以下示例:
<table class="dont-match">
<tr class="dont-match">
<td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
<td class="match">Plain text in here.</td>
<td class="dont-match"><img src="..." /></td>
</tr>
</table>
<div class="dont-match">
<div class="dont-match"><img src="..." /></div>
<div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
<p class="match">Plain text in here.</p>
</div>
显然,类match
、maybe-match
和dont-match
只是为了演示目的,不能用于匹配。 maybe-match
的意思是最好不要匹配,但我可以自己解决问题,以防难以排除这些。
提前非常感谢!
对于"匹配",请使用:
//*[text()[normalize-space()] and not(../text()[normalize-space()])]
对于"可能匹配",请使用:
//*[../text()[normalize-space()]]
基于 XSLT 的验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//*[text()[normalize-space()] and not(../text()[normalize-space()])]"/>
==========
<xsl:copy-of select="//*[../text()[normalize-space()]]"/>
</xsl:template>
</xsl:stylesheet>
当此转换应用于提供的 XML(包装到单个顶部元素中以成为格式正确的 XML 文档)时:
<t>
<table class="dont-match">
<tr class="dont-match">
<td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
<td class="match">Plain text in here.</td>
<td class="dont-match"><img src="..." /></td>
</tr>
</table>
<div class="dont-match">
<div class="dont-match"><img src="..." /></div>
<div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
<p class="match">Plain text in here.</p>
</div>
</t>
将计算两个 XPath 表达式中的每一个,并将所选节点复制到输出中:
<td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
<td class="match">Plain text in here.</td>
<div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
<p class="match">Plain text in here.</p>
==========
<strong class="maybe-match">content</strong>
<em class="maybe-match">content</em>
如我们所见,这两个表达式都精确地选择了所需的元素。
要获得匹配项和可能的匹配
项,您可以使用 //*[count(text())>=1]
如果 XML 分析器仅忽略空格文本节点,否则
//*[normalize-space(string(./text())) != ""]
并且可以通过检查某些 anchestor 是否匹配来过滤掉可能匹配的匹配项,但随后它变得丑陋(空格仅作为文本节点大小写):
//*[(normalize-space(string(./text())) != "") and count(./ancestor::*[normalize-space(string(./text())) != ""]) = 0]