我有一个XSLT,可以将html表转换为CSV
,定义如下<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="//tr">
<xsl:for-each select="td">
<xsl:if test="position() > 1">,</xsl:if>
<xsl:value-of select="."/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
但是我现在遇到的问题是这些表的标签是用 ascii 代码编写的。
示例输入:
<table><tr>
<th>Order ID</th>
<th>Item ID</th>
<th>Participant ID</th>
<th>Status</th>
<th>Shipping Provider</th>
<th>Tracking Number</th>
<th>Shipped Date</th>
<th>Shipping Method</th></tr>
<tr>
<td align="center"> Choice_DJ4</td>
<td align="center"> 4</td>
<td align="center"> DXM09902</td>
<td align="center"> Shipped</td>
<td align="center"> USPS</td>
<td align="center"> </td>
<td align="center"> 04/13/2017</td>
<td align="center"> Standard Ground</td>
</tr>
</table>
我的问题是,有没有办法让 xsl 文件将 ascii 代码识别为其预期字符。更新: 这是我的java代码
String data = readFile("config/email.xml");
System.out.println("Data: n" + data);
InputSource is = new InputSource(new StringReader(data));
String configFile = "config/email-xslt.xsl";
File stylesheet = new File(configFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);
transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);
在 XSLT 3.0 中,您可以使用unparsed-text()
加载文本,parse-xml-fragment()
取消转义实体,parse-xml()
分析 XML 字符串。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<!--first, load the contents of the document (adjust path to your document) -->
<xsl:variable name="input" select="unparsed-text('table.txt')" as="item()"/>
<!--second, unescape the angle bracket entities -->
<xsl:variable name="table-text" select="parse-xml-fragment($input)" as="item()"/>
<!--third, parse the serialized XML string -->
<xsl:variable name="table" select="parse-xml($table-text)" as="item()"/>
<xsl:for-each select="$table//tr">
<!--a more simplified way of generating the CSV for each row -->
<xsl:value-of select="td" separator=","/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
能够解决问题...乌斯org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);
我的 xsl 文件和数据输入(config/email.xml(仍然保持来自 OP 的文件和数据输入,但我必须修改 java 代码以在传递给 xsl 转换器之前取消转义这些字符。
String data = readFile("config/email.xml");
data = StringEscapeUtils.unescapeXml(data);
System.out.println("Data: n" + data);
InputSource is = new InputSource(new StringReader(data));
String configFile = "config/email-xslt.xsl";
File stylesheet = new File(configFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);
transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);