我正在尝试下载repatcha的所有图像,但不知怎么的,我无法点击recaptcha iframe的复选框。当点击它时,HtmlUnit抛出WrappedException。我不确定为什么会发生这种情况我该如何点击链接并下载图片?我猜想这就是GWT的问题所在。我可以点击任何其他正常按钮。
如有任何帮助,将不胜感激
主要站点为:https://www.google.com/recaptcha/api2/demo
到目前为止,我已经做到了。
private static final Logger LOG = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) throws IOException {
try (WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38)) {
webClient.getCache().clear();
final WebClientOptions webClientOptions = webClient.getOptions();
webClientOptions.setTimeout(40000);
webClientOptions.setRedirectEnabled(false);
// webClientOptions.setUseInsecureSSL(true);
webClient.setAlertHandler(new AlertHandler() {
public void handleAlert(Page page, String string) {
System.out.printf("alert: %s%n", string);
LOG.info("javascript alert: {}", string);
}
});
webClientOptions.setJavaScriptEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
// webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClientOptions.setThrowExceptionOnScriptError(false);
webClientOptions.setThrowExceptionOnFailingStatusCode(false);
HtmlPage reCaptchaFrame;
final HtmlPage page = webClient.getPage("https://www.google.com/recaptcha/api2/demo");
webClient.getJavaScriptEngine().pumpEventLoop(1000);
webClient.waitForBackgroundJavaScript(200);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
if (waitForBackgroundJavaScript == 0) {
if (LOG.isTraceEnabled())
LOG.trace("HtmlUnit exits background javascript at loop counter " + loopCount);
break;
}
}
JavaScriptEngine engine = webClient.getJavaScriptEngine();
engine.holdPosponedActions();
final List<FrameWindow> frames = page.getFrames();
reCaptchaFrame = (HtmlPage) frames.get(0).getEnclosedPage();
// initiating to enter the reCaptcha
final HtmlSpan reCaptchaAnchor = reCaptchaFrame.getFirstByXPath(".//span[@id='recaptcha-anchor']");
if (reCaptchaAnchor == null) {
throw new NullPointerException("Captcha not found");
}
try {
HtmlPage page1 = reCaptchaAnchor.click(); // here I get the exception
} catch (WrappedException e) {
LOG.info("Found some stupid exception {}", e.details());
}
} catch (Exception e) {
LOG.info("Found exception {}", e.getMessage());
}
}
堆栈跟踪:
net.sourceforge.htmlunit.corejs.javascript.WrappedException: Wrapped java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.Context.throwAsScriptRuntimeEx(Con text.java:2053)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1007)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1072)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:789)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:732)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:679)
at recaptchatest.Main.main(Main.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.hasTopCall(ScriptRuntime.java:3263)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:102)
at com.gargoylesoftware.htmlunit.javascript.host.Promise$1.execute(Promise.java:136)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1002)
... 10 more
在将HTMLUnit升级到2.22并将HTMLUnit核心js库升级到2.22.之后,一切都按预期工作。
我建议您检查元素的实际XPATH是什么。
该框有一个带有class='captcha-checkbox-checkmark'的
您可以将XPath与一起使用
reCaptchaFrame.getFirstByXPath("//div[@class='recaptcha-checkbox-checkmark']");
如果失败,请尝试使用XPATH以外的其他选择方法,可能是CSS方法,即queryselector方法。
即:
reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
若要在不发生强制转换错误的情况下进行分配,请使用HtmlElement在web上强制转换所有元素。
HtmlElement a = reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
a.click();