XSLT/XPATH:处理XHTML文件以将分隔的文本部分转换为新<SPAN>类



我得到了一堆使用contenteditable=true在浏览器中编辑的文件testX.xhtml。编辑的目的是用两个相同的字符分隔文本的部分,如以下xhtml文件中的下划线字符:

{xhtml源文件}:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style xmlns:xhtml="http://www.w3.org/1999/xhtml" type="text/css" xml:space="preserve"/>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type"/>
<title>title XHTML</title>
</head>
<body>

<span class="ok">_blablabla blebleble_ bliblibli</span>
<p class="ko">blablabla _blebleble bliblibli <em class="em">one em tag</em> blablabla blebleble._</p>

</body>
</html>

保存编辑后的文件,然后由下面的xslt进行处理,以便将标记的部分嵌入到名为my_span的新span类中,以便进一步处理:

{xslt文件}:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xpath-default-namespace="http://www.w3.org/1999/xhtml" 
version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns="http://www.w3.org/1999/xhtml">

<xsl:output method="xhtml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>   
</xsl:template>

<xsl:template match="/">

<xsl:for-each select="collection('?select=test*.xhtml')">

<xsl:variable name="path_to_span">
<xsl:value-of select="iri-to-uri(replace(document-uri(current()), '.xhtml', '.span.xhtml'))"/>
</xsl:variable>


<xsl:result-document indent="yes" method="xhtml" href="{$path_to_span}">
<xsl:apply-templates/> 
</xsl:result-document>
</xsl:for-each>

</xsl:template>

<xsl:template match="//text()">

<xsl:analyze-string select="." regex="(.*?)_(.*?)_">
<xsl:matching-substring>

<xsl:value-of select="regex-group(1)"/>
<span class="my_span">
<xsl:value-of select="regex-group(2)"/>
</span>

</xsl:matching-substring>

<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>

</xsl:template>

</xsl:stylesheet>

产生以下内容:

{生成的XHTML文件}:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<style xmlns:xhtml="http://www.w3.org/1999/xhtml" type="text/css" xml:space="preserve"></style>
<title>title XHTML</title>
</head>
<body>

<span class="ok"><span class="my_span">blablabla blebleble</span> bliblibli</span>

<p class="ko">blablabla _blebleble bliblibli <em class="em">one em tag</em> blablabla blebleble._</p>

</body>
</html>

不幸的是,我发现一些p标记包含em或I或类似的标记,这些标记不能被XSLT处理。

我希望能够生成以下xhtml:

{期望的XHTML文件}:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<style xmlns:xhtml="http://www.w3.org/1999/xhtml" type="text/css" xml:space="preserve"></style>
<title>title XHTML</title>
</head>
<body>

<span class="ok"><span class="my_span">blablabla blebleble</span> bliblibli</span>

<p class="ko">blablabla 

<span class="my_span">blebleble bliblibli </span>

<em class="em"><span class="my_span">one em tag</span></em>

<span class="my_span">blablabla blebleble.</span>

</p>

</body>
</html>

我将xhtml源文件简化为一个不被XSLT处理的em标记,但是在一个p标记中可能有许多类似标记的组合。

在我期望的xhtml文件中,我将添加的span放在em中,但是交换它们也可以工作。

如何在XSLT中实现这一点?

谢谢你的帮助。

我试图在一个转换步骤中将_字符转换为处理指令<?marker?>,然后在第二对中将<?marker?>s转换为<?open?>/<?close?>对,最后使用递归函数将基于for-each-group group-starting-with="processing-instruction('open')的任何此类对与嵌套的for-each-group group-ending-with="processing-instruction('close')"分组:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">

<xsl:param name="wrap-class" as="xs:string">my_class</xsl:param>

<xsl:mode on-no-match="shallow-copy"/>

<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:variable name="marked-content">
<xsl:apply-templates mode="analyze"/>
</xsl:variable>
<xsl:variable name="paired-content">
<xsl:apply-templates select="$marked-content/node()" mode="pair-markers"/>
</xsl:variable>
<xsl:apply-templates select="$paired-content/node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="*[processing-instruction('open')]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:sequence select="mf:group(node())"/>
</xsl:copy>
</xsl:template>

<xsl:mode name="analyze" on-no-match="shallow-copy"/>

<xsl:template mode="analyze" match="text()">
<xsl:apply-templates select="analyze-string(., '_')" mode="mark"/>
</xsl:template>

<xsl:template mode="mark" match="fn:*">
<xsl:apply-templates mode="#current"/>
</xsl:template>

<xsl:template mode="mark" match="fn:match">
<xsl:processing-instruction name="marker"/>
</xsl:template>

<xsl:mode name="pair-markers" on-no-match="shallow-copy"/>

<xsl:template mode="pair-markers" match="processing-instruction('marker')">
<xsl:variable name="pos" as="xs:integer">
<xsl:number/>
</xsl:variable>
<xsl:choose>
<xsl:when test="$pos mod 2 = 1">
<xsl:processing-instruction name="open"/>
</xsl:when>
<xsl:otherwise>
<xsl:processing-instruction name="close"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:function name="mf:group">
<xsl:param name="nodes" as="node()*"/>
<xsl:for-each-group select="$nodes" group-starting-with="processing-instruction('open')">
<xsl:choose>
<xsl:when test="self::processing-instruction('open')">
<xsl:for-each-group select="tail(current-group())" group-ending-with="processing-instruction('close')">
<xsl:choose>
<xsl:when test="position() = 1">
<span class="{$wrap-class}">
<xsl:sequence select="mf:group(current-group()[position() ne last()])"/>
</span>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:function>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/nb9PtDX

相关内容