在PhantomJS和GhostDriver中列入黑名单url非常简单。首先使用处理程序初始化驱动程序:
PhantomJSDriver driver = new PhantomJSDriver();
driver.executePhantomJS(loadFile("/phantomjs/handlers.js"))
并配置处理程序:
this.onResourceRequested = function (requestData, networkRequest) {
var allowedUrls = [
/https?://localhost.*/,
/https?://.*.example.com/?.*/
];
var disallowedUrls = [
/https?://nonono.com.*/
];
function isUrlAllowed(url) {
function matches(url) {
return function(re) {
return re.test(url);
};
}
return allowedUrls.some(matches(url)) && !disallowedUrls.some(matches(url));
}
if (!isUrlAllowed(requestData.url)) {
console.log("Aborting disallowed request (# " + requestData.id + ") to url: '" + requestData.url + "'");
networkRequest.abort();
}
};
我还没有找到一个很好的方法来做这个与HtmlUnitDriver。在如何从HtmlUnit中的特定url中过滤javascript中提到了ScriptPreProcessor,但它使用WebClient,而不是htmllunitdriver。什么好主意吗?
扩展HtmlUnitDriver并实现一个ScriptPreProcessor
(用于编辑内容)和一个HttpWebConnection
(用于允许/阻止url):
public class FilteringHtmlUnitDriver extends HtmlUnitDriver {
private static final String[] ALLOWED_URLS = {
"https?://localhost.*",
"https?://.*\.yes.yes/?.*",
};
private static final String[] DISALLOWED_URLS = {
"https?://spam.nono.*"
};
public FilteringHtmlUnitDriver(DesiredCapabilities capabilities) {
super(capabilities);
}
@Override
protected WebClient modifyWebClient(WebClient client) {
WebConnection connection = filteringWebConnection(client);
ScriptPreProcessor preProcessor = filteringPreProcessor();
client.setWebConnection(connection);
client.setScriptPreProcessor(preProcessor);
return client;
}
private ScriptPreProcessor filteringPreProcessor() {
return (htmlPage, sourceCode, sourceName, lineNumber, htmlElement) -> editContent(sourceCode);
}
private String editContent(String sourceCode) {
return sourceCode.replaceAll("foo", "bar"); }
private WebConnection filteringWebConnection(WebClient client) {
return new HttpWebConnection(client) {
@Override
public WebResponse getResponse(WebRequest request) throws IOException {
String url = request.getUrl().toString();
WebResponse emptyResponse = new WebResponse(
new WebResponseData("".getBytes(), SC_OK, "", new ArrayList<>()), request, 0);
for (String disallowed : DISALLOWED_URLS) {
if (url.matches(disallowed)) {
return emptyResponse;
}
}
for (String allowed : ALLOWED_URLS) {
if (url.matches(allowed)) {
return super.getResponse(request);
}
}
return emptyResponse;
}
};
}
}
这既可以编辑内容,也可以阻止url。