# xD;和# 13;同时读写XML文件



我有一个从网络API输入的XML文件。当我试图从浏览器将其保存为XML文件时,它有一些多余的
。问题是,当试图通过StaX解析这个XML数据,并在处理后,执行一些任务写回另一种XML格式作为DOM,它有
代替。

所有我想做的是避免这些多余的
从输入和
从输出。我找不出原因,也找不出解决办法。

这是我得到的输入 XML元素值保存到文件后,

Today is a fine day.

So does everyday.

写入后,输出

Today is a fine day.

So does everyday.

实际期望和需要的输出

<someNode>Today is a fine day.
So does everyday.
</someNode>

节点的Text值中的新行是有意的,需要保持原样。

简化代码示例:

从API读取流:

// Get Input XML stream from API
URL apiURL = new URL(API_Url);
HttpsURLConnection httpsAPIURLConn;
httpsAPIURLConn = (HttpsURLConnection) apiURL.openConnection();
httpsAPIURLConn.setConnectTimeout(10000); // timeout
httpsAPIURLConn.setDoInput(true);
InputStream inStream = httpsAPIURLConn.getInputStream();
// Data stream okay, Start StaX XLIFF reader
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
// This is to read entity referenced strings
xmlInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
// StaX StreamReader
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new BufferedInputStream(inStream), "UTF-8");
// Read and load XML data to in-memory database to filter and process

写入过滤处理原始数据后的新XML结构文件

// After processing and writing new Element structure to org.w3c.dom.Document
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer tr = transformerFactory.newTransformer();
tr.setOutputProperty(OutputKeys.INDENT, "yes");
tr.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
tr.setOutputProperty(OutputKeys.METHOD, "xml");
tr.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tr.setOutputProperty(OutputKeys.STANDALONE, "no");
DOMSource source = new DOMSource(doc);
File file = new File(xmlFilePath);
Writer outputStream = new OutputStreamWriter(new FileOutputStream(file), "UTF-8");
StreamResult result = new StreamResult(outputStream);
tr.transform(source, result);

不知道我到底错过了什么。但是任何建议或帮助都会很好。

最简单的解决方案(除了连接到SAX事件流之外)是编写一个XSLT脚本来完成您所需要的工作,并将其作为您的转换器而不是默认的标识转换器来调用。

参考http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT。

然后需要提供自己的规则来转换文本节点,其中通过将ASCII 13字符转换为空字符串来删除它们。

最新更新