为什么有些网站不能自动使用Selenium



我尝试过自动化https://www.westernunion.com/global-service/track-transfer网页,但不明白为什么网站并没有导航到下一页。

我的脚本是打开页面->输入MTCN为2587051083->单击"继续"按钮但是点击后什么也没有显示。而手动复制相同的步骤效果很好。这类网站是否缺少任何浏览器设置?我对一无所知

public static void main(String ar[]) {
System.setProperty("webdriver.chrome.driver","D:\Study\selenium-java-2.48.2\selenium-2.48.2\chromedriver.exe");
driver=new ChromeDriver();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.manage().window().maximize();
driver.get("https://www.westernunion.com/global-service/track-transfer");
driver.findElement(By.xpath("//input[@id='trackingNumber']")).sendKeys("2587051083");
driver.findElement(By.xpath("//button[@id='button-track-transfer']")).click();
}

https://www.westernunion.com/global-service/track-transfer网页上,在跟踪字段内发送字符序列我对您自己的代码进行了一些小修改,导致WebDriverwait需要的元素可点击,然后在元素上调用click(),文本为继续,如下所示:

  • 代码块:

    import org.openqa.selenium.By;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriver;
    import org.openqa.selenium.chrome.ChromeOptions;
    import org.openqa.selenium.support.ui.ExpectedConditions;
    import org.openqa.selenium.support.ui.WebDriverWait;
    public class westernunion {
    public static void main(String[] args) {
    System.setProperty("webdriver.chrome.driver","C:\Utility\BrowserDrivers\chromedriver.exe");
    ChromeOptions opt = new ChromeOptions();
    opt.addArguments("start-maximized");
    opt.addArguments("disable-infobars");
    opt.addArguments("--disable-extensions");
    WebDriver driver=new ChromeDriver(opt);
    driver.get("https://www.westernunion.com/global-service/track-transfer");
    new WebDriverWait(driver, 10).until(ExpectedConditions.elementToBeClickable(By.cssSelector("input.new-field.form-control.tt-mtcn.ng-pristine.ng-valid-mask"))).sendKeys("2587051083");
    driver.findElement(By.cssSelector("button.btn.btn-primary.btn-lg.btn-block.background-color-teal.remove-margin#button-track-transfer")).click();
    }
    }
    

看起来click()确实发生了,微调器在一段时间内变得可见,但搜索被中断,在检查网页时,您会发现<script>标记和<link>标记中的一些引用了关键字为distcss。例如:

  • <link rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/responsive_css.min.css">
  • <script src="/content/wucom/dist/20181210075630/js/js-bumblebee.js"></script>
  • <link ng-if="trackTransferVm.trackTransferData.newTrackTransfer || trackTransferVm.trackTransferData.isRetail" rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/main.min.css" class="ng-scope" style="">

这清楚地表明该网站受到Bot Management服务提供商Distil Networks的保护,ChromeDriver的导航被检测到,随后被阻止


远端

根据文章There Really Is Something About Distil.it…:

Distil通过观察网站行为和识别抓取器特有的模式,保护网站免受自动内容抓取机器人的攻击。当Distil在一个网站上识别出恶意机器人时,它会创建一个被列入黑名单的行为配置文件,并将其部署到所有客户。类似于机器人防火墙,Distil可以检测模式并做出反应。

此外,

Distil首席执行官Rami Essaid上周在接受采访时表示:

"One pattern with Selenium was automating the theft of Web content""Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考

你可以在中找到一些详细的讨论

  • Distil检测WebDriver驱动的Chrome浏览上下文
  • 硒网络驱动程序:修改navigator.webdriver标志以阻止硒检测
  • Akamai Bot Manager检测到WebDriver驱动的Chrome浏览上下文

相关内容

  • 没有找到相关文章

最新更新