当试图抓取网站时,报告模糊错误



我正在构建一个蜘蛛网来抓取雅虎金融。我正试图让它点击主页上的市场指数链接,并从相应市场指数页面上的表格中获取最后收盘价

2021-05-29 11:39:21 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: scrapybot)
2021-05-29 11:39:21 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.5 (v3.8.5:580fbb018f, Jul 20 2020, 12:11:27) - [Clang 6.0 (clang-600.0.57)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 3.0, Platform macOS-10.16-x86_64-i386-64bit
2021-05-29 11:39:21 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-05-29 11:39:21 [scrapy.crawler] INFO: Overridden settings:
{}
2021-05-29 11:39:21 [scrapy.extensions.telnet] INFO: Telnet Password: 8306af0a852a89a8
2021-05-29 11:39:21 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']

这是代码

import scrapy
from scrapy.crawler import CrawlerProcess
class YahooFinanceSpider(scrapy.Spider):
name = "Yahoo Stock Scraper"
button_loc = '//*[@id="marketsummary-itm-0"]/h3/a[1]'
close_loc = '//*[@id="quote-summary"]/div[1]/table/tbody/tr[1]/td[2]/span/text()'
def __init__(self, urls):
self.urls=urls
def start_requests(self):
for url in self.urls:
scrapy.Request(url=url, callback=self.parse_front)
def parse_front(self, response):
button = response.xpath(YahooFinanceSpider.button_loc)
button_link = button.css('a.Fz(s).Ell.Fw(600).C($linkColor ::attr(href)')
links_to_follow = button_link.extract()
for url in links_to_follow:
yield response.follow(url = url, callback = self.parse_pages)
def parse_pages(self, response):
closing_value = response.xpath(YahooFinanceSpider.close_loc).extract()
for value in closing_value:
print(value)

prices = []
urls=['https://finance.yahoo.com/']
yscraper=YahooFinanceSpider(urls)
process = CrawlerProcess()
process.crawl(YahooFinanceSpider)
process.start()

您应该使用process.crawl(yscraper)而不是process.crawl(YahooFinanceSpider)
您正在实例化对象yscraper,但没有使用它。

最新更新