使用 HtmlUnit 预渲染 Javascript 网站(HTML 快照)



我正在尝试构建一个由HtmlUnit提供支持的预渲染器,并尝试使用以下网址对其进行测试:https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3

这是我的代码:

final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
WebClientOptions options = webClient.getOptions();
options.setCssEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
//    webClient.setAjaxController(new AjaxController(){
//        @Override
//        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) {
//            return true;
//        }
//    });
options.setThrowExceptionOnScriptError(false);
options.setThrowExceptionOnFailingStatusCode(false);
options.setRedirectEnabled(false);
options.setAppletEnabled(false);
options.setJavaScriptEnabled(true);
//options.setUseInsecureSSL(true);
options.setTimeout(50000);
webClient.addRequestHeader("Access-Control-Allow-Origin", "*");
HtmlPage page = webClient.getPage(path);
// important!  Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.setJavaScriptTimeout(10000);
webClient.waitForBackgroundJavaScript(10000);
//just wait
for (int i = 0; i < 20; i++) {
    synchronized (page) {
        page.wait(500);
    }
}
String xml = page.asXml();

这里的问题是输出 html 不包含应该使用 Javascript 获取的内容。

这里可能出了什么问题?

好吧,下面的代码使用 2.28 快照检索:

Donec in massa vel lectus aliquam laoreet nec et turpis.

try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
    WebClientOptions options = webClient.getOptions();
    options.setCssEnabled(true);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    options.setTimeout(50000);
    webClient.addRequestHeader("Access-Control-Allow-Origin", "*");
    HtmlPage page = webClient.getPage("https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3");
    // important!  Give the headless browser enough time to execute JavaScript
    // The exact time to wait may depend on your application.
    webClient.setJavaScriptTimeout(10000);
    webClient.waitForBackgroundJavaScript(10000);
    //just wait
    Thread.sleep(10000);
    String xml = page.asXml();
    System.out.println(xml);
}

你还错过了什么?

最新更新