在没有登录文件的情况下用scratch登录网站



我正试图废弃您在代码中可以找到的网站。我的主要问题是成功登录。根据我在谷歌Chrome上在线阅读的内容,技术是转到网络->登录->查看连接文件;formdata";。遗憾的是,没有这样的文件。不使用此文件我能做什么?

import scrapy

class QuotesSpider(scrapy.Spider):
name = "quotes"
urls = [
'https://app.nominations.hospimedia.fr'
]
def parse(self, response):

# the function "callback" is used after you have logging in
return scrapy.FormRequest.from_response(
response,
formdata={'email': 'XXX', 'pwd': 'XXXX'},
callback=self.starts_scraping
)
def start_scraping(self, response):
name = response.xpath('//span[@class"name-first-name]/text()"').extract()
yield {'user_name': name}

或者,我也尝试过Request,但这不适用于

import scrapy
import json

class QuotesSpider(scrapy.Spider):
name = "quotes"
urls = [
'https://app.nominations.hospimedia.fr'
]
def parse(self, response):
payload = {
'payload': {
'email': 'XXX',
'pwd': 'XX',
}
}

# the function "callback" is used after you have logging in
yield scrapy.Request(
url='https://app.nominations.hospimedia.fr',
body=json.dumps(payload),
method='POST',
callback=self.starts_scraping
)
def start_scraping(self, response):
name = response.xpath('//span[@class"name-first-name]/text()"').extract()
yield {'user_name': name}
  1. urls应为start_urls

  2. 您在回调self.starts_scraping而不是self.start_scraping中有一个拼写错误。

import scrapy

class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['https://app.nominations.hospimedia.fr']
def parse(self, response):
# the function "callback" is used after you have logging in
return scrapy.FormRequest.from_response(
response,
formdata={'user[email]': 'XXX', 'user[password]': 'XXXX'},
callback=self.start_scraping
)
def start_scraping(self, response):
name = response.xpath('//span[@class"name-first-name]/text()"').extract()
yield {'user_name': name}

最新更新