有
<BATCHNAME> Any</BATCHNAME>
标记在我的 XML 请求中具有值中的"字符。没有这些字符,我的代码可以完美运行,但在某些情况下我有这些字符。它给了我以下错误
[致命错误] :144:28:字符引用"&# org.xml.sax.SAXParseException;行号: 144;列数: 28;字符 参考 "&# at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257( 在 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339( at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121( at d.b(AllCommonTasks.java:277( at ...
我需要验证这些字符
我正在尝试此代码=>
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
URLConnection urlConnection = new URL(urlString).openConnection();
urlConnection.addRequestProperty("Accept", "application/xml");
urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 ( compatible ) ");
Document doc = db.parse(urlConnection.getInputStream());
doc.getDocumentElement().normalize();
str = convertDocumentToString(doc);
}catch(Exception e){
System.err.println("In exception 1");
e.printStackTrace();
}
我该如何解决这个问题?
查看维基百科页面的XML和HTML实体引用,遵循&#nnnn;
模式的实体引用是十进制形式的Unicode码位,这意味着
等效于Unicode U + 0004:END OF TRANSMISSION
这是一个非打印字符。
所以我认为解析器在这种情况下失败是正确的。
事实上,如果你看一下com.sun.org.apache.xerces.internal.impl.XMLScanner#scanCharReferenceValue
的来源,你可以看到它引用了com.sun.org.apache.xerces.internal.util.XMLChar#isValid
这里:
/**
* Returns true if the specified character is valid. This method
* also checks the surrogate character range from 0x10000 to 0x10FFFF.
* <p>
* If the program chooses to apply the mask directly to the
* <code>CHARS</code> array, then they are responsible for checking
* the surrogate character range.
*
* @param c The character to check.
*/
public static boolean isValid(int c) {
return (c < 0x10000 && (CHARS[c] & MASK_VALID) != 0) ||
(0x10000 <= c && c <= 0x10FFFF);
} // isValid(int):boolean