我使用crawler4j
lib和它们的依赖项来爬行页面。
controller.start(BasicCrawler.class, numberOfCrawlers);
与
controller.startNonBlocking(BasicCrawler.class, numberOfCrawlers);
?
take at look at
https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/crawler/CrawlController.java,第二个选项允许爬虫线程在触发爬虫请求后继续执行。