X509 证书异常,同时使用风暴爬网程序抓取某些网址



我一直在使用StormCrawler来抓取网站。作为https协议,我在StormCrawler中设置了默认的https协议。但是,当我抓取某些网站时,我收到以下异常:

Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[?:1.8.0_131]
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[?:1.8.0_131]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) ~[?:1.8.0_131]
at sun.security.validator.Validator.validate(Validator.java:260) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) ~[?:1.8.0_131]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496) ~[?:1.8.0_131]
... 20 more

是否有任何机制可以自动下载证书并设置爬虫,我应该如何设置爬虫的配置?

此问题并非特定于 StormCrawler。此答案解释说,您可以手动导入证书,除非您专门抓取该站点,否则这不是一个真正的选择。另一种选择是禁用证书验证。这将需要修改协议实现,但应该是可行的。

你试过OKHttp实现吗?它的行为可能与Apache HttClient不同。参见 okhttp 维基。

最新更新