制作 XSLT 翻译文件以识别 ASCII 字符



我有一个XSLT,可以将html表转换为CSV

,定义如下
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:fo="http://www.w3.org/1999/XSL/Format" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="/">
         <xsl:for-each select="//tr">
            <xsl:for-each select="td">
                <xsl:if test="position() > 1">,</xsl:if>
                <xsl:value-of select="."/>
            </xsl:for-each>
         <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

但是我现在遇到的问题是这些表的标签是用 ascii 代码编写的。

示例输入:

&lt;table&gt;&lt;tr&gt;
        &lt;th&gt;Order ID&lt;/th&gt;
        &lt;th&gt;Item ID&lt;/th&gt;
        &lt;th&gt;Participant ID&lt;/th&gt;
        &lt;th&gt;Status&lt;/th&gt;
        &lt;th&gt;Shipping Provider&lt;/th&gt;
        &lt;th&gt;Tracking Number&lt;/th&gt;
        &lt;th&gt;Shipped Date&lt;/th&gt;
        &lt;th&gt;Shipping Method&lt;/th&gt;&lt;/tr&gt;
            &lt;tr&gt;
            &lt;td align="center"&gt; Choice_DJ4&lt;/td&gt;
            &lt;td align="center"&gt; 4&lt;/td&gt;
            &lt;td align="center"&gt; DXM09902&lt;/td&gt;
            &lt;td align="center"&gt; Shipped&lt;/td&gt; 
            &lt;td align="center"&gt; USPS&lt;/td&gt; 
            &lt;td align="center"&gt; &lt;/td&gt; 
            &lt;td align="center"&gt; 04/13/2017&lt;/td&gt; 
            &lt;td align="center"&gt; Standard Ground&lt;/td&gt; 
            &lt;/tr&gt;
    &lt;/table&gt;

我的问题是,有没有办法让 xsl 文件将 ascii 代码识别为其预期字符。更新: 这是我的java代码

String data = readFile("config/email.xml");
    System.out.println("Data: n" + data);
    InputSource is = new InputSource(new StringReader(data));
    String configFile = "config/email-xslt.xsl";
    File stylesheet = new File(configFile);
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(is);
    StreamSource stylesource = new StreamSource(stylesheet);
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer(stylesource);
    Source source = new DOMSource(document);
    StringWriter sw = new StringWriter();
    Result outputTarget = new StreamResult(sw);
    transformer.transform(source, outputTarget);
    data = sw.toString();
    System.out.println("Output: " + data);

在 XSLT 3.0 中,您可以使用unparsed-text()加载文本,parse-xml-fragment()取消转义实体,parse-xml()分析 XML 字符串。

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0">
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="/">
        <!--first, load the contents of the document (adjust path to your document) -->
        <xsl:variable name="input" select="unparsed-text('table.txt')" as="item()"/>
        <!--second, unescape the angle bracket entities -->
        <xsl:variable name="table-text" select="parse-xml-fragment($input)" as="item()"/>
        <!--third, parse the serialized XML string -->
        <xsl:variable name="table" select="parse-xml($table-text)" as="item()"/>
        <xsl:for-each select="$table//tr">
            <!--a more simplified way of generating the CSV for each row -->
            <xsl:value-of select="td" separator=","/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>
终于

能够解决问题...乌斯org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);

我的 xsl 文件和数据输入(config/email.xml(仍然保持来自 OP 的文件和数据输入,但我必须修改 java 代码以在传递给 xsl 转换器之前取消转义这些字符。

String data = readFile("config/email.xml");
data = StringEscapeUtils.unescapeXml(data);
System.out.println("Data: n" + data);
InputSource is = new InputSource(new StringReader(data));
String configFile = "config/email-xslt.xsl";
File stylesheet = new File(configFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
     .newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);
transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);

最新更新