致命错误：字符引用"&# org.xml.sax.SAXParseException;

有

 <BATCHNAME>&#4; Any</BATCHNAME>

标记在我的 XML 请求中具有值中的"字符。没有这些字符，我的代码可以完美运行，但在某些情况下我有这些字符。它给了我以下错误

[致命错误] ：144：28：字符引用"&# org.xml.sax.SAXParseException;行号： 144;列数： 28;字符参考 "&# at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java：257( 在 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java：339( at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java：121( at d.b(AllCommonTasks.java：277( at ...

我需要验证这些字符

我正在尝试此代码=>

try {                      
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        URLConnection urlConnection = new URL(urlString).openConnection();
        urlConnection.addRequestProperty("Accept", "application/xml");
        urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 ( compatible ) ");
        Document doc = db.parse(urlConnection.getInputStream());
        doc.getDocumentElement().normalize();
        str = convertDocumentToString(doc);

    }catch(Exception e){
        System.err.println("In exception 1");
        e.printStackTrace();
    }

我该如何解决这个问题？

查看维基百科页面的XML和HTML实体引用，遵循&#nnnn;模式的实体引用是十进制形式的Unicode码位，这意味着等效于Unicode U + 0004：END OF TRANSMISSION这是一个非打印字符。

所以我认为解析器在这种情况下失败是正确的。

事实上，如果你看一下com.sun.org.apache.xerces.internal.impl.XMLScanner#scanCharReferenceValue的来源，你可以看到它引用了com.sun.org.apache.xerces.internal.util.XMLChar#isValid这里：

/**
 * Returns true if the specified character is valid. This method
 * also checks the surrogate character range from 0x10000 to 0x10FFFF.
 * <p>
 * If the program chooses to apply the mask directly to the
 * <code>CHARS</code> array, then they are responsible for checking
 * the surrogate character range.
 *
 * @param c The character to check.
 */
public static boolean isValid(int c) {
    return (c < 0x10000 && (CHARS[c] & MASK_VALID) != 0) ||
           (0x10000 <= c && c <= 0x10FFFF);
} // isValid(int):boolean

相关内容

最新更新

热门标签：