>有谁知道如何从html代码中获取像按钮这样的标签的文档对象模型(DOM)?这就是我到目前为止得到的...
JEditorPane p = new JEditorPane();
p.setContentType("text/html");
p.setText(" <!DOCTYPE html>\nn" +
" <html dir="ltr" lang="en">\nn" +
" <head>\nn" +
" <meta http-equiv="Content-Type" content="text/html; " />\nn" +
" <title>Alidoosti</title>\nn" +
" </head>\nn" +
" <body>\nn" +
" <button id="miti" type="submit">Search</button>n" +
" </body>\nn" +
" </html>\n"); // Document text is provided below.
HTMLDocument d = (HTMLDocument) p.getDocument();
ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("js");
try {
engine.eval("function getDomPath(el) {n" +
" var stack = [];n" +
" while ( el.parentNode != null ) {n" +
" console.log(el.nodeName);n" +
" var sibCount = 0;n" +
" var sibIndex = 0;n" +
" for ( var i = 0; i < el.parentNode.childNodes.length; i++ ) {n" +
" var sib = el.parentNode.childNodes[i];n" +
" if ( sib.nodeName == el.nodeName ) {n" +
" if ( sib === el ) {n" +
" sibIndex = sibCount;n" +
" }n" +
" sibCount++;n" +
" }n" +
" }n" +
" if ( el.hasAttribute('id') && el.id != '' ) {n" +
" stack.unshift(el.nodeName.toLowerCase() + '#' + el.id);n" +
" } else if ( sibCount > 1 ) {n" +
" stack.unshift(el.nodeName.toLowerCase() + ':eq(' + sibIndex + ')');n" +
" } else {n" +
" stack.unshift(el.nodeName.toLowerCase());n" +
" }n" +
" el = el.parentNode;n" +
" }n" +
" return stack.slice(1); // removes the html elementn" +
"}"+
"var path = getDomPath("+d+".getElementById('miti'));n" +
"console.log(path.join(' > '));");
但是我得到了这个错误:
javax.script.ScriptException: <eval>:26:60 Missing space after numeric literal
}var path = getDomPath(javax.swing.text.html.HTMLDocument@75f32542.getElementById('miti'));
^ in <eval> at line number 26 at column number 60
导致此错误的原因是什么?
你可以使用 Jsoup 来实现这一点。将其添加到类路径中并像这样使用它
String html = "[YOUR HTML IN HERE]"
Document doc = Jsoup.parse(html);
Elements buttons = doc.select("button");
之后,对于从select("button")调用中检索到的每个元素,您可以使用 parents() 方法获取父堆栈
在这里阅读Jsoup:https://jsoup.org/
你的变量d
是 HTMLDocument 类型,当与字符串连接时,它会调用 java 的方法toString()
该方法(对于您正在使用的当前类型)返回一个类似于javax.swing.text.html.HTMLDocument@75f32542
的字符串。
我猜你更喜欢使用 javascript 对象document
.
请问你想做什么?这似乎是一种非常复杂的方法,除了解析一些 x(ht)ml 之外,不会做更多的事情。