转换节点内容以删除空白

如果citations节点的内容如下:

                <p>
            WAJWAJADS:
            </p>
<p>
            asdf
            </p>
<p>
            ALSOAS:
            </p>
<p>
            lorem ipsum...<br />
lorem<br />
blah blah <i>
            adfas &amp; dasdsaafs
            </i>, April 2011.<br />
lorem lorem dear lord the whitespace
            </p>

是否有办法用XSLT将其转换为格式正确的HTML ?

normalize-space()只是将所有东西连接在一起。我所能做的最好的是在for-each循环内的所有p后代上normalize-space()，并将它们包装在p元素中。然而，任何内部标签仍然丢失。

是否有更好的方法来解析这个WYSIWYG生成的火车残骸?不幸的是，我无法控制生成的XML。

我对Martin Honnen的回答做了一些修改:

<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
    <xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != '  '">
        <xsl:text> </xsl:text>
    </xsl:if>
</xsl:template>

测试最后一个字符是否为空格且最后两个字符不同时为空格，如果为真，则插入一个空格

首先需要一个具有根的格式良好的XML。

假设您有了这些，您可以应用标识转换将源树复制到结果中，在标记之间去掉空格，可选地生成HTML输出(没有XML声明)和缩进，并仅在文本节点中使用normalize-space()。

试试这个样式表:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:strip-space elements="*"/>
    <xsl:output indent="yes" method="html"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="text()">
         <xsl:value-of select="normalize-space(.)"/>
    </xsl:template>
</xsl:stylesheet>

应用于您提供的数据的结果将是:

<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>lorem<br>blah blah<i>adfas &amp; dasdsaafs</i>, April 2011.<br>lorem lorem dear lord the whitespace
</p>

您可以在XSLT Fiddle

中看到应用于示例的结果

UPDATE 1:要在每个文本节点周围添加额外的空间(并避免在计算节点的字符串值时串联)，您可以将最后一个模板替换为:

<xsl:template match="text()">
    <xsl:value-of select="concat(' ',normalize-space(.),' ')"/>
</xsl:template>

结果:

<html>
   <p> WAJWAJADS: </p>
   <p> asdf </p>
   <p> ALSOAS: </p>
   <p> lorem ipsum... <br> lorem <br> blah blah <i> adfas &amp; dasdsaafs </i> , April 2011. <br> lorem lorem dear lord the whitespace 
   </p>
</html>

见:http://xsltransform.net/3NzcBsE/1

UPDATE 2:在每个复制的元素后添加空格或换行符。将<xsl:text>
</xsl:text>(换行符)或<xsl:text> </xsl:text>(空格)放在第一个模板的</xsl:copy>之后:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    <xsl:text>&#xa;</xsl:text>
</xsl:template>

结果:

<html>
   <p>WAJWAJADS:</p>
   <p>asdf</p>
   <p>ALSOAS:</p>
   <p>lorem ipsum...<br>
      lorem<br>
      blah blah<i>adfas &amp; dasdsaafs</i>
      , April 2011.<br>
      lorem lorem dear lord the whitespace
   </p>
</html>

见:http://xsltransform.net/3NzcBsE/2

使用标识转换模板和文本节点模板执行normalize-space:

<xsl:template match="text()"><xsl:value-of select="normalize-space()"/></xsl:template>

如果示例包含真正的文本而不是胡言乱语，那么这个问题将更容易理解。在节点开始/结束和文本之间没有额外的空白。"不是对预期结果的足够准确的描述。

我将在这里进行猜测，并假设您实际上希望对所有文本节点执行"对一个空格的运行"操作。可以这样做:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="text()" priority="1">
    <xsl:variable name="temp" select="normalize-space(concat('x', ., 'x'))" />
    <xsl:value-of select="substring($temp, 2, string-length($temp) - 2)"/>
</xsl:template>
</xsl:stylesheet>

当应用于以下测试输入时:

<chapter>

           <p>
    This         question          would         have
been       a     lot    <b>   easier      </b>      to understand 
        if     the      example   contained     
   <i>     real  </i>    text    instead   of 
   gibberish.
                     </p>

    <p>
    Here     is       an      example       of     preserving   zero     spaces 
    between    text   nodes:<br/>(continued)       on   a new   line. 


    </p>

        <p>
    Here  is       another      example       of     
    preserving   zero     spaces     within    a      text
    node:     <i>some     text  in      italic</i>       followed    
    by   normal      text. 

    </p>

</chapter>

结果将是:

<?xml version="1.0" encoding="UTF-8"?>
<chapter>
   <p> This question would have been a lot <b> easier </b> to understand if the example contained <i> real </i> text instead of gibberish. </p>
   <p> Here is an example of preserving zero spaces between text nodes:<br/>(continued) on a new line. </p>
   <p> Here is another example of preserving zero spaces within a text node: <i>some text in italic</i> followed by normal text. </p>
</chapter>

,
注意，在HTML中呈现时，输入和输出之间没有区别。

相关内容

最新更新

热门标签：