我试着从这里解析HTML代码:https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG+08+%7C+DARK+WATER+%28Field-Tested%29&sort=lh
但是网站Opskins.com有保护"僵尸检测",当你第一次访问网站-你应该等待大约5秒,然后你会被重定向或重新加载到正确的页面,我需要。
如何等待这5秒或一些HTML代码在这个页面上?
Document doc = Jsoup.connect("https://opskins.com" + url)
.header("authority", "opskins.com")
.header("method", "GET")
.header("path", url)
.header("scheme", "https")
//до сюда с двоеточниями запросы
.header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
.header("accept-encoding", "gzip, deflate, sdch, br")
.header("accept-language", "ru,en-US;q=0.8,en;q=0.6")
.header("cache-control", "max-age=0")
//.header("cookie", "__cfduid=d76231c8cccdbd5303a7d4feeb3f3a11f1466541718; _gat=1; _ga=GA1.2.1292204706.1466541721; request_method=POST; _session_id=5dc49c7814d5087ac51f9d9da20b2680")
.cookie("steamLogin", "76561198065140894%7C%7C0C35CE73983BCA63E456B6A4831DD772D095AE77")
.cookie("steamLoginSecure", "76561198065140894%7C%7CCC21BEC8A5E8AD53E9C7086E51BDB8CE407C100A")
.cookie("steamMachineAuth76561198065140894", "8857F82DB9960F7B66F7842B5F880229A9AF63AB")
.header("dnt", "1")
.header("upgrade-insecure-requests", "1")
.userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")
//.header("user-agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")
.followRedirects(true)
.ignoreHttpErrors(true)
//.timeout(5000)
.get();
使用上面的代码,我可以获取页面Bot检测的HTML代码
我为你的问题做了一些功课,即使我不能给你一个简单的解决方案。仔细的观察帮助我想出了更聪明的解决办法。下面是帮助您通过bot的代码。
public class BotDetection {
public static void main(String[] args) throws IOException {
Document document = Jsoup.connect("https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG%2008%20%7C%20DARK%20WATER%20%28Field-Tested%29&sort=lh")
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();
/**
* I'm interested in these three elements
* <form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
* <input type="hidden" name="jschl_vc" value="53ebdc738d543e1f1fd40f8d4abec414">
* <input type="hidden" name="pass" value="1467568987.973-p8bu/jSSDf">
* <input type="hidden" id="jschl-answer" name="jschl_answer">
* </form>
*/
Element elementById = document.getElementById("challenge-form"),jschlchild = elementById.child(0), passChild = elementById.child(1);
String url = "https://opskins.com".concat(elementById.attr("action")).concat("?")
.concat(jschlchild.attr("name")).concat("=").concat(jschlchild.attr("value")).concat("&")
.concat(passChild.attr("name")).concat("=").concat(passChild.attr("value")).concat("&jschl-answer=65");
document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();
//Bingo You are done.
System.out.println(document.body());
}
即使我没有通过jschl-answer=65,它也为我工作。