我在网站上很难获得一些信息,我设置了ROBOTSTXT_OBEY = False
,但仍然无法检索到任何信息,如何修复它?
start_urls = ['https://tienda.mercadona.es/search-results?query=leche%20entera']
def parse(self, response):
sample = response.css("div").get()
yield {'name':sample}
非常感谢,就我所见,当我请求时,他们可能会禁止我
您试图抓取的网站是用JavaScript动态加载的。Vanilla Scrapy默认情况下不会处理javascript,但有些插件可能会有所帮助。脑海中浮现的一个简单的例子是"废剧作家"。一旦正确配置,通常只需要将DOWNLOAD_HANDLERS添加到settings.py文件中,如下所示:
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
然后,您需要将meta={"playwright":True}
作为参数传递到scrapy Request中。