使用 Scrapy 的 Python "unexpected indent"



我练习过使用scrapy,我成功地运行了这段代码,并刮去了页面的所有元素:

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.selector import Selector
class ecomspider(scrapy.Spider):
name = 'ecom'
headers = {
'user_agents': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Applewebkit/537.36 (KHTML, like Gecko)'
}
def start_requests(self):
url = 'https://maroof.sa/'
yield scrapy.Request(url=url,
headers=self.headers,
callback=self.parse)
def parse(self, response):
with open('res.html', 'w') as html_file:
html_file.write(response.text)

process = CrawlerProcess()
process.crawl(ecomspider)
process.start()

现在,我会对元素更加挑剔,但每次运行这段代码时,我都会得到";headers={^缩进错误:意外缩进":

class ecomspider(scrapy.Spider):
name = 'ecom'
custom_settings = {
'FEED_FORMAT': 'csv',
'FEED_URI': 'ecom.csv',
'LOG_FILE': 'ecom.log'
}
headers = {
'user_agents': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33'
}
def start_requests(self):
url = 'https://maroof.sa/'
yield scrapy.Request(url=url,
headers=self.headers,
callback=self.parse)   
def parse(self, response):
# extract data
for card in response.css('a.tab-pro-container'):
items = {
'title': card.css('a.media-heading').css('a::text').get(),
}

yield items
process = CrawlerProcess()
process.crawl(ecomspider)
process.start()

那怎么了?

您没有正确地缩进代码。头应该在custom_setting下面。

custom_setting
|
|
|
header 

试试这个,让我知道。。

custom_settings = {
'FEED_FORMAT': 'csv',
'FEED_URI': 'ecom.csv',
'LOG_FILE': 'ecom.log'
}
headers = {
'user_agents': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 
Edg/105.0.1343.33'
}

希望你能得到我的分数

最新更新