如何修复从网页中提取完整链接的python代码?可用代码提取了部分链接



我是python的初学者,使用BeautifulSoup从以下网页提取链接https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital.所有可用的代码如下,

html_page = urllib.request.urlopen("https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(link.get('href'))

输出包括部分链路/提供者";,等等。它应该是";https://mhealthfairview.org/providers"。有没有什么方法可以提取完整链接而不是部分链接?非常感谢。

使用urllib.parse.urljoin

from urllib.parse import urljoin
url = "https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(urljoin(url, link.get('href')))

您可以简单地使用if.

webroot = 'https://mhealthfairview.org'
href = link.get('href')
if href[0] == "/":
print(webroot + href)

最新更新