我是python的初学者,使用BeautifulSoup从以下网页提取链接https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital.所有可用的代码如下,
html_page = urllib.request.urlopen("https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(link.get('href'))
输出包括部分链路/提供者";,等等。它应该是";https://mhealthfairview.org/providers"。有没有什么方法可以提取完整链接而不是部分链接?非常感谢。
使用urllib.parse.urljoin
from urllib.parse import urljoin
url = "https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(urljoin(url, link.get('href')))
您可以简单地使用if.
webroot = 'https://mhealthfairview.org'
href = link.get('href')
if href[0] == "/":
print(webroot + href)