我可以使用XMLSchema来验证没有xmlns属性的文档吗



我有一种情况,我想开始使用XML模式来验证迄今为止从未有过模式定义的文档。因此,我想要验证的现有文档中没有任何xmlns声明。

成功验证包含xmlns声明的文档没有问题,但我也希望能够在没有此类声明的情况下验证这些文档。我希望有这样的东西:

DocumentBuilderFactory dbf = ...;
dbf.setSchema(... my schema for namespace "foo:bar"...);
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setDefaultNamespace("foo:bar");
Document doc = db.parse(input);

不存在这样的方法DocumentBuilder.setDefaultNamespace,因此在加载此类型的文档时不执行模式验证。

如果没有设置文档的命名空间,有什么方法可以强制它吗?或者,这基本上需要在不考虑模式的情况下解析XML,检查现有的命名空间,调整它,然后用模式重新验证文档吗?

我目前希望解析器在解析过程中执行验证,但我可以先解析,然后再验证。

更新2021-01-13

下面是一个具体的例子,作为一个JUnit测试用例,我正在尝试做什么。

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.junit.Assert;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLSchemaTest
{
private static final String XMLNS = "http://www.example.com/schema";
private static final String schemaDocument = "<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="" + XMLNS + "" xmlns:e="" + XMLNS + "" elementFormDefault="qualified"><xs:element name="example" type="e:exampleType" /><xs:complexType name="exampleType"><xs:sequence><xs:element name="test" type="e:testType" /></xs:sequence></xs:complexType><xs:complexType name="testType" /></xs:schema>";
private static Document parse(String document) throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setErrorHandler(new MyErrorHandler());
return db.parse(new InputSource(new StringReader(document)));
}
@Test
public void testConformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns="" + XMLNS + ""><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
@Test
public void testConformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
@Test
public void testNononformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns="" + XMLNS + ""><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
@Test
public void testNononformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
public static class MyErrorHandler implements ErrorHandler {
@Override
public void warning(SAXParseException exception) throws SAXException {
System.err.println("WARNING: " + exception);
}
@Override
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
@Override
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: " + exception);
}
}
}

testConformingDocumentWithoutSchema外,所有测试均通过。我认为这是意料之中的事情,因为文档没有声明命名空间。

我在问如何更改测试(但不能更改文档本身!(,以便根据文档实际未声明的模式验证文档。

我花了一段时间,终于想出了一个有效的破解方法。也许可以更优雅地完成这项工作(这是我最初的问题(,也可能用更少的代码完成这项任务,但这正是我所能想到的。

如果你看看问题中的JUnit测试用例;解析";方法(并将XMLNS作为第二个参数添加到对parse的所有调用中(将允许完成所有测试:

import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
...
private static Document parse(String document, String namespace) throws SAXException, ParserConfigurationException, IOException {
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
ErrorHandler errorHandler = new MyErrorHandler();
db.setErrorHandler(errorHandler);
try {
return db.parse(new InputSource(new StringReader(document)));
} catch (SAXParseException spe) {
// Just in case this was a problem with a missing namespace
// System.out.println("Possibly recovering from SPE " + spe);
// New DocumentBuilder without the schema
dbf.setSchema(null);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
Document doc = db.parse(new InputSource(new StringReader(document)));
if(null != doc.getDocumentElement().getNamespaceURI()) {
// Namespace URI was set; this is a fatal error
throw spe;
}
// Override the namespace on the Document + root element
doc.getDocumentElement().setAttribute("xmlns", namespace);
// Serialize the document -> String to start over again
DOMImplementationLS domImplementation = (DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
LSOutput lsOutput = domImplementation.createLSOutput();
lsOutput.setEncoding("UTF-8");
StringWriter out = new StringWriter();
lsOutput.setCharacterStream(out);
lsSerializer.write(doc, lsOutput);
String converted = out.toString();
// Re-enable the schema
dbf.setSchema(schema);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
return db.parse(new InputSource(new StringReader(converted)));
}
}

这是通过捕获SAXParseException来实现的,因为SAXParseException没有放弃任何细节,所以假设问题可能是由于缺少XML命名空间声明造成的。然后,我在不进行模式验证的情况下重新解析文档,向内存中的Document添加命名空间声明,然后将Document序列化为String,并在重新启用模式验证的条件下重新解析该文档

我试着通过设置XML名称空间然后使用Schema.newValidator().validate(new DOMSource(doc))来实现这一点,但每次都没有通过验证。运行序列化程序就解决了这个问题。

最新更新