Raise ValueError('请求网址中缺少方案: %s' % self._url) ValueError: 请求网址中缺少方案: javascript:void(0);



这是我的蜘蛛代码

spider.py

import scrapy
class ExampleSpider(scrapy.Spider):
name = 'moneycontrol'
# allowed_domains = ['moneycontrol.com']
start_urls = ['https://www.moneycontrol.com/india/stockpricequote/']
def parse(self, response):
stoke_link_list = response.css("table a::attr(href)").getall()
if response.css("span.span_price_wrap::text").getall(): # value of this variable only present in first run
stock_name =  response.css("h1.pcstname::text").get()
bse_price, nse_price = response.css("span.span_price_wrap::text").getall()
print(stock_name + ' ' + bse_price + ' ' + nse_price)
else:
print('stock_name bse_price nse_price')
for link in stoke_link_list:
if link is not None:
next_page = response.urljoin(link)
# yield scrapy.Request(next_page, callback=self.parse)
yield response.follow(next_page, callback=self.parse)

运行此程序时,我遇到了一个奇怪的错误。它在运行时抓取某些网站时出错,而在抓取不同网站时再次出错(我的意思是可能为以前的网站运行(。

错误:

2020-08-20 19:52:49 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.moneycontrol.com/mutual-funds/nav/motilal-oswal-midcap-30-fund-regular-plan/MMO025> (referer: https://www.moneycontrol.com/india/stockpricequote/pharmaceuticals/abbottindia/AI51)
Traceback (most recent call last):

File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
yield next(it)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/moneycontrol/moneycontrol/spiders/my_spider.py", line 24, in parse
yield scrapy.Request(next_page, callback=self.parse)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
self._set_url(url)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/http/request/__init__.py", line 69, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: javascript:void(0);

Run2

2020-08-20 19:55:15 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.moneycontrol.com/mutual-funds/nav/dsp-equity-opportunities-fund-regular-plan/MDS011> (referer: https://www.moneycontrol.com/india/stockpricequote/pharmaceuticals/alkemlaboratories/AL05)
Traceback (most recent call last):
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
yield next(it)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/vishvajeet/Desktop/Programming/python/scrapy/moneycontrol/moneycontrol/spiders/my_spider.py", line 24, in parse
yield scrapy.Request(next_page, callback=self.parse)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
self._set_url(url)
File "/home/vishvajeet/Desktop/Programming/python/scrapy/env/lib/python3.6/site-packages/scrapy/http/request/__init__.py", line 69, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: javascript:void(0);

我看了其他stackoverflow的答案,但没有一个能解决我的问题。比如尝试start_urls列表、使用follow等。请求URL 中缺少方案

错误">请求url中缺少scheme"表示url没有http://或https://前缀。

问题的出现是由于测试网页中存在具有相对URL的链接。

例如,moneycontrol.com网站上名为">Zee entertain"的链接具有

href值为"india/stockpricequote/mediaentertainment/zeeentertainmententerprises/ZEE";

因此,当Python程序试图打开此链接时,会抛出"Missing scheme"错误。


如何解决问题

">缺少方案"的问题可以通过预处理来解决https://hostname到所有相关URL链接(即,不以http://或https://开头的链接(

要前置的代码段https://hostname到相对URL:

for link in stoke_link_list:
if link is not None:
if not link.startswith("https://moneycontrol.com/")
page_url = ("https://moneycontrol.com/" + link)

最新更新