scrapy.http.request 不执行来自 Scrapy 的回调



我想向一个网站(例如Twitter(发送一个http请求,并从scrapy runspider获取title标签,并通过一个程序运行它:

class firstSpider(scrapy.Spider): 
name = "first" 
start_urls=['http://twitter.com/begin_password_reset']     
def get_token(self):
print('aa')
try:
url = "https://www.twitter.com/"
# Set the headers here. 
headers =  {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
'Referer': 'https://www.twitter.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
}
# Send the request
scrapy.http.Request(url,callback=self.parse_twitter, method='GET' , headers=headers,  dont_filter=False)
except ValueError:
print("Oops!  That was no valid number.  Try again...")
def parse_twitter(self,response):
try:
filename = 'aaa.txt'
print('bb')
with open(filename, 'wb') as f: 
f.write(Selector(response=response).xpath('//title/text()').get().encode())
except ValueError:
print("Oops!  That was no valid number.  Try again...")

f= firstSpider()
f.get_token()

第一个函数get_token正在工作,但回调函数twitter_parse不工作。

我还使用了yieldreturn,但第二个方法不执行,也不返回任何内容。

您应该在您的spider中实现一个parse方法,调用get_token并通过CrawlerProcess:运行它

Selector有问题,但您可以在本例中看到打印语句:

import scrapy
from scrapy import Request, Selector
from scrapy.crawler import CrawlerProcess

class firstSpider(scrapy.Spider):
name = "first"
start_urls = ['https://twitter.com/begin_password_reset']
def parse(self, response):
return self.get_token()
def get_token(self):
print('aa')
try:
url = "https://www.twitter.com/"
# Set the headers here.
headers = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
'Referer': 'https://www.twitter.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
}
# Send the request
yield Request(url, callback=self.parse_twitter, method='GET', headers=headers, dont_filter=False)
except ValueError:
print("Oops!  That was no valid number.  Try again...")
def parse_twitter(self, response):
try:
filename = 'aaa.txt'
print('bb')
with open(filename, 'wb') as f:
f.write(Selector(response=response).xpath('//title/text()').get().encode())
except ValueError:
print("Oops!  That was no valid number.  Try again...")

if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
})
c.crawl(firstSpider)
c.start()

最新更新