我的html文档中有这样的结构:
<p>
"<em>You</em> began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "<em>You</em> were Mr. Bingley's first choice."
</p>
但我需要我的"纯文本"包装在标签中,以便能够处理它:)
<p>
<text>"</text>
<em>You</em>
<text> began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "</text>
<em>You</em>
<text> were Mr. Bingley's first choice."</text>
</p>
有什么想法可以做到这一点吗?我看过tag汤和jsoup,但我似乎不是一个容易解决这个问题的方法。也许使用一些奇特的正则表达式。
感谢
这里有一个建议:
public static Node toTextElement(String str) {
Element e = new Element(Tag.valueOf("text"), "");
e.appendText(str);
return e;
}
public static void replaceTextNodes(Node root) {
if (root instanceof TextNode)
root.replaceWith(toTextElement(((TextNode) root).text()));
else
for (Node child : root.childNodes())
replaceTextNodes(child);
}
测试代码:
String html = "<p>"<em>You</em> began the evening well, Charlotte," " +
"said Mrs. Bennet with civil self–command to Miss Lucas." +
" "<em>You</em> were Mr. Bingley's first choice."</p>";
Document doc = Jsoup.parse(html);
for (Node n : doc.body().children())
replaceTextNodes(n);
System.out.println(doc);
输出:
<html>
<head></head>
<body>
<p>
<text>
"
</text><em>
<text>
You
</text></em>
<text>
began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "
</text><em>
<text>
You
</text></em>
<text>
were Mr. Bingley's first choice."
</text></p>
</body>
</html>