InvalidSchema( "No connection adapters were found for {!r}" .format(url))



这个错误意味着什么以及如何解决这种错误?我收到这个错误

Traceback (most recent call last):
File "load-more.py", line 146, in <module>
response = session.get(link)
File "C:UsersXone.virtualenvsWeb_Scrapers-A6P4QRzclibsite-packagesrequests sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "C:UsersXone.virtualenvsWeb_Scrapers-A6P4QRzclibsite-packagesrequestssessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:UsersXone.virtualenvsWeb_Scrapers-A6P4QRzclibsite-packagesrequestssessions.py", line 649, in send
adapter = self.get_adapter(url=request.url)
File "C:UsersXone.virtualenvsWeb_Scrapers-A6P4QRzclibsite-packagesrequestssessions.py", line 742, in get_adapter
raise InvalidSchema("No connection adapters were found for {!r}".format(url))
requests.exceptions.InvalidSchema: No connection adapters were found for '\"https:\/\/lifebridgecapital.com\/2021\/06\/11\/ws964-multifamily-investing-is-a-team-sport-with-cameron-roy\/\"'

当我试图解析标题的链接时。我正试图与请求刮贴方法这里是代码:

import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Origin': 'https://lifebridgecapital.com',
'Connection': 'keep-alive',
'Referer': 'https://lifebridgecapital.com/podcast/',
'Sec-GPC': '1',
'TE': 'Trailers',
}
data = {'action': 'gdlr_core_post_ajax', 
'settings[category][]': 'podcast', 
'settings[tag]': '', 'settings[num-fetch]': '9',
'settings[paged]': '1', 
'option[name]': 'paged', 
}
session = requests.Session()
for page in range(0, 55):
data['option[value]'] = str(page + 1)
response = session.post('https://lifebridgecapital.com/wp-admin/admin-ajax.php', headers=headers, data=data)
links = [a['href'] for a in BeautifulSoup(response.text, 'lxml').select('h3 > a')]
for link in links:
response = session.get(link)
page = BeautifulSoup(response.text, 'lxml')
title = page.find('h3').text
print(f'Title: {title}, Link: {link}')

#print(f'title: {title}, links: {links}')

我得到了所有的链接,但当试图解析该链接的标题时,出现了这个无效模式错误。在SO询问之前,我在谷歌上搜索了很多,但没有得到解决方案或答案来解释为什么会出现这个错误。

您得到一个JSON响应,html位于content中。因此,您不能直接将Beautiful Soup与response.text一起使用。

response.text替换为response.json()['content']

links = [a['href'] for a in BeautifulSoup(response.json()['content'], 'lxml').select('h3 > a')]

最新更新