如何在刮擦中加入多处理

我正在尝试抓取： https://www.jny.com/collections/bottoms

为了一次抓取和抓取多个页面，我正在使用多处理

def parse(self, response):
p = Pool(10)  # Pool tells how many at a time
print("in herre")
self.product_url = response.xpath('//div[@class = "collection-grid js-filter-grid"]//a/@href').getall()
print(self.product_url)
records = p.map(self.load_url, self.product_url)
p.terminate()
p.join()

它给出以下错误：

AttributeError: Can't pickle local object 'Crawler.__init__.<locals>.<lambda>'

这个问题的其他答案指出，池应该在模块的开头说明。但是，在这种情况下这是不可能的，因为startRequest是调用的第一个方法。

Scrapy已经使用异步编程一次抓取多个页面。

您可以调整CONCURRENT_设置以满足并发要求。

相关内容

最新更新

热门标签：