我想向一个网站(例如Twitter(发送一个http请求,并从scrapy runspider
获取title
标签,并通过一个程序运行它:
class firstSpider(scrapy.Spider):
name = "first"
start_urls=['http://twitter.com/begin_password_reset']
def get_token(self):
print('aa')
try:
url = "https://www.twitter.com/"
# Set the headers here.
headers = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
'Referer': 'https://www.twitter.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
}
# Send the request
scrapy.http.Request(url,callback=self.parse_twitter, method='GET' , headers=headers, dont_filter=False)
except ValueError:
print("Oops! That was no valid number. Try again...")
def parse_twitter(self,response):
try:
filename = 'aaa.txt'
print('bb')
with open(filename, 'wb') as f:
f.write(Selector(response=response).xpath('//title/text()').get().encode())
except ValueError:
print("Oops! That was no valid number. Try again...")
f= firstSpider()
f.get_token()
第一个函数get_token
正在工作,但回调函数twitter_parse
不工作。
我还使用了yield
和return
,但第二个方法不执行,也不返回任何内容。
您应该在您的spider中实现一个parse
方法,调用get_token
并通过CrawlerProcess
:运行它
Selector有问题,但您可以在本例中看到打印语句:
import scrapy
from scrapy import Request, Selector
from scrapy.crawler import CrawlerProcess
class firstSpider(scrapy.Spider):
name = "first"
start_urls = ['https://twitter.com/begin_password_reset']
def parse(self, response):
return self.get_token()
def get_token(self):
print('aa')
try:
url = "https://www.twitter.com/"
# Set the headers here.
headers = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
'Referer': 'https://www.twitter.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
}
# Send the request
yield Request(url, callback=self.parse_twitter, method='GET', headers=headers, dont_filter=False)
except ValueError:
print("Oops! That was no valid number. Try again...")
def parse_twitter(self, response):
try:
filename = 'aaa.txt'
print('bb')
with open(filename, 'wb') as f:
f.write(Selector(response=response).xpath('//title/text()').get().encode())
except ValueError:
print("Oops! That was no valid number. Try again...")
if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
})
c.crawl(firstSpider)
c.start()