Python 硒网页抓取 - 冻结页面



我正在用硒抓取几页,并且由于很多 ajax 操作,我不使用其他框架(如 scrapy 等(。我的问题是内容几乎每秒都会自动刷新(例如财务数据(,但我想在静态状态下抓取所有元素。我在互联网上搜索了很多,尤其是在stackoverflow上。用硒冻结网站的最简单方法是什么?我什至尝试关闭无线适配器,但这是一个问题......这是我找到的硒文档中唯一的命令:

driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)

我测试了这段代码,当我运行脚本时,它没有任何效果。网站仍在"自动刷新"...

"例如这个:https://gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq(那里( 不是本网站的 API(">


事实上,api是存在的,但它不是完全公开的。

若要将图表的值作为json对象获取,需要构造一个自定义 URL,如下所示:

https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z

输出:

{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}
...

笔记:

  • 您可以更改limit参数以显示不同数量的记录(如果需要((测试最大 400(
  • 日期也应自动更新以获取最新值。

一种解决方案可能是研究能够为您用于驱动程序的任何浏览器设置配置首选项。例如,如果使用 Firefox,您可以将 accessibility.blockautorefresh 设置为 False,然后在准备就绪时使用 driver.refresh((。

https://lifehacker.com/disable-automatic-web-page-refreshing-5321420

PHPUnit + Selenium: 如何设置 Firefox about:config options?

最新更新