使用ContentNodes和修改文本内容



我使用ColdFusion的HtmlCleaner。在下面的代码中,我遍历节点树并查找内容节点。我想做的是能够修改节点的文本内容。

node.traverse(new TagNodeVisitor() {
    public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
         if (htmlNode instanceof ContentNode) {
            ContentNode content = ((ContentNode) htmlNode); 
            String textContent = content.getContent();
        }
        // tells visitor to continue traversing the DOM tree
        return true;
    }
});

我用的例子是:

    // traverse whole DOM and update images to absolute URLs
node.traverse(new TagNodeVisitor() {
    public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
        if (htmlNode instanceof TagNode) {
            TagNode tag = (TagNode) htmlNode;
            String tagName = tag.getName();
            if ("img".equals(tagName)) {
                String src = tag.getAttributeByName("src");
                if (src != null) {
                    tag.setAttribute("src", Utils.fullUrl(siteUrl, src));
                }
            }
        } else if (htmlNode instanceof CommentNode) {
            CommentNode comment = ((CommentNode) htmlNode); 
            comment.getContent().append(" -- By HtmlCleaner");
        }
        // tells visitor to continue traversing the DOM tree
        return true;
    }
});

我不熟悉HtmlCleaner,它只执行"清洁"吗?我找不到任何方法来设置文本值。http://htmlcleaner.sourceforge.net/doc/index.html

jsoup是一个完整的HTML解析器(用Java编写),可以像使用jQuery一样处理DOM元素。我使用text() setter方法来更新文本节点。http://jsoup.org/cookbook/modifying-data/set-text

// intitial: <div></div>
div = doc.select("div").first();
div.text("five > four");
div.prepend("First ");
div.append(" Last");
// now: <div>First five &gt; four Last</div>

关于jsoup(和ColdFusion)的更多信息:

  • http://jsoup.org/
  • http://www.bennadel.com/blog/2358-parsing-traversing-and-mutating-html-with-coldfusion-and-jsoup.htm
  • http://www.raymondcamden.com/index.cfm/2012/4/6/jsoup-adds-jQuerylike-parsing-in-Java

我想做的是抓取html标签之间的内容,以便我可以将它们翻译成另一种语言,而不需要混淆html标签,图像等…

    node.traverse(new TagNodeVisitor() {
    public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
    if (htmlNode instanceof ContentNode) {
            ContentNode content = ((ContentNode) htmlNode); 
            URLConnection urlConn;
            StringBuilder result = new StringBuilder();
            String USER_AGENT =  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)";
            String text = content.getContent();
            String strUrl = "http://translate.google.com/translate_a/t?client=t&sl=#arguments.FromLanguage#&tl=#arguments.ToLanguage#&hl=#arguments.ToLanguage#&sc=2&ie=UTF-8&oe=UTF-8&oc=1&otf=1&ssel=0&tsel=0&q=" + URLEncoder.encode(text);
            URL url = new URL(strUrl);
            urlConn = url.openConnection();
            urlConn.addRequestProperty("User-Agent",
                            "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
            Reader reader = new InputStreamReader(urlConn.getInputStream(),
                            "utf-8");
            JsonArray gRet = new Gson().fromJson(reader, JsonArray.class);
            StringBuffer newContent = new StringBuffer(1000);
            gRet.get(0)?.each() { el -> newContent.append(el.getAsJsonArray()?.get(0)?.getAsString()); };
            tagNode.insertChildAfter(htmlNode, new ContentNode(newContent.toString()));
            tagNode.removeChild(htmlNode);
        }
    }
});

相关内容

  • 没有找到相关文章

最新更新