小贝子编程

刮擦爬虫不会在Instagram上抓取简单的Instagram标签

本文关键字：Instagram 抓取标签简单爬虫 python scrapy web-crawler
更新时间 : 2023-09-17
英文 : scrapy crawler not scraping simple instagram tags on instagram

我正在创建一个非常简单的网络爬虫版本，它从页面底部导航栏中推断和计算一些简单的

标签 www.instagram.com

以下代码适用于除Instagram以外的任何其他网站：

import scrapy
class InstaSpider(scrapy.Spider):
name = "insta_spider"
start_urls = ["https://www.instagram.com/"]
count = 1

def parse(self, response):
SET_SELECTOR = ".K5OFK"
for tag in response.css(SET_SELECTOR):
self.count += 1
print("My count is " + str(self.count))

爬虫生成以下代码：页面被抓取，但由于某种原因它找不到该特定类(或 Instagram 页面上的任何其他类(，因此它将产生计数 1，而不是计数 10。

INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG: Crawled (200) <GET https://www.instagram.com/> (referer: None)
My count is 1

这是为什么？

似乎它正在正确阅读页面，但没有锁定 li 项目

刮擦爬虫不会在Instagram上抓取简单的Instagram标签

相关内容

最新更新

热门标签：