嗨,在我的代码中,我运行了以下几行,但这段代码在执行过程中崩溃了。
ByteArrayInputStream input = new ByteArrayInputStream(fileContent);
final HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties props = cleaner.getProperties();
DomSerializer doms = new DomSerializer(props, true);
org.w3c.dom.Document xmlDoc = null;
try {
TagNode node = cleaner.clean(input);
xmlDoc = doms.createDOM(node);
} catch (Exception e) {
System.out.println("Tiding error ");
e.printStackTrace();
}
这是错误的堆栈:
NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.checkDOMNSErr(CoreDocumentImpl.java:2535)
at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.setName(AttrNSImpl.java:113)
at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.<init>(AttrNSImpl.java:74)
at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.createAttributeNS(CoreDocumentImpl.java:2138)
at com.sun.org.apache.xerces.internal.dom.ElementImpl.setAttributeNS(ElementImpl.java:656)
at org.htmlcleaner.DomSerializer.setAttributes(DomSerializer.java:97)
at org.htmlcleaner.DomSerializer.createDOM(DomSerializer.java:37)
有人能帮忙找出为什么会发生这种事吗?
真诚的,Zoli
HTMLCleaner在处理命名空间时遇到问题。下面是一个XML命名空间声明的例子,它会给它带来麻烦:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"
xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml"
itemscope itemtype="http://schema.org/CreativeWork">
正如您所看到的,itemscope属性被破坏,因此HtmlCleaner抛出NAME_SPACE_ERR。
避免该问题的一种方法是添加线路
props.setNamespacesAware(false);
这将关闭命名空间处理。