如何使用JSOUP在页面上等待代码?我有503错误与Bot检测



我试着从这里解析HTML代码:https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG+08+%7C+DARK+WATER+%28Field-Tested%29&sort=lh

但是网站Opskins.com有保护"僵尸检测",当你第一次访问网站-你应该等待大约5秒,然后你会被重定向或重新加载到正确的页面,我需要。

如何等待这5秒或一些HTML代码在这个页面上?

Document doc = Jsoup.connect("https://opskins.com" + url)
            .header("authority", "opskins.com")
            .header("method", "GET")
            .header("path", url)
            .header("scheme", "https")
            //до сюда с двоеточниями запросы
            .header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
            .header("accept-encoding", "gzip, deflate, sdch, br")
            .header("accept-language", "ru,en-US;q=0.8,en;q=0.6")
            .header("cache-control", "max-age=0")
            //.header("cookie", "__cfduid=d76231c8cccdbd5303a7d4feeb3f3a11f1466541718; _gat=1; _ga=GA1.2.1292204706.1466541721; request_method=POST; _session_id=5dc49c7814d5087ac51f9d9da20b2680")
            .cookie("steamLogin", "76561198065140894%7C%7C0C35CE73983BCA63E456B6A4831DD772D095AE77")
            .cookie("steamLoginSecure", "76561198065140894%7C%7CCC21BEC8A5E8AD53E9C7086E51BDB8CE407C100A")
            .cookie("steamMachineAuth76561198065140894", "8857F82DB9960F7B66F7842B5F880229A9AF63AB")
            .header("dnt", "1")
            .header("upgrade-insecure-requests", "1")
            .userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")
            //.header("user-agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")
            .followRedirects(true)
            .ignoreHttpErrors(true)
            //.timeout(5000)
            .get();

使用上面的代码,我可以获取页面Bot检测的HTML代码

我为你的问题做了一些功课,即使我不能给你一个简单的解决方案。仔细的观察帮助我想出了更聪明的解决办法。下面是帮助您通过bot的代码。

public class BotDetection {
    public static void main(String[] args) throws IOException {
        Document document = Jsoup.connect("https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG%2008%20%7C%20DARK%20WATER%20%28Field-Tested%29&sort=lh")
        .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();
        /**
         * I'm interested in these three elements
         *     <form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get"> 
         *       <input type="hidden" name="jschl_vc" value="53ebdc738d543e1f1fd40f8d4abec414"> 
         *       <input type="hidden" name="pass" value="1467568987.973-p8bu/jSSDf"> 
         *       <input type="hidden" id="jschl-answer" name="jschl_answer"> 
         *      </form> 
         */
        Element elementById = document.getElementById("challenge-form"),jschlchild = elementById.child(0), passChild = elementById.child(1);
        String url = "https://opskins.com".concat(elementById.attr("action")).concat("?")
                .concat(jschlchild.attr("name")).concat("=").concat(jschlchild.attr("value")).concat("&")
                .concat(passChild.attr("name")).concat("=").concat(passChild.attr("value")).concat("&jschl-answer=65");
        document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();
        //Bingo You are done.
        System.out.println(document.body());
    }

即使我没有通过jschl-answer=65,它也为我工作。

最新更新