小贝子编程

如何修复从网页中提取完整链接的python代码?可用代码提取了部分链接

本文关键字：代码提取链接何修复 python 网页 python beautifulsoup
更新时间 : 2023-09-20
英文 : How to fix the python codes to extract full links from a webpage? Available codes extracted partial links

我是python的初学者，使用BeautifulSoup从以下网页提取链接https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital.所有可用的代码如下，

html_page = urllib.request.urlopen("https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(link.get('href'))

输出包括部分链路/提供者"；，等等。它应该是"；https://mhealthfairview.org/providers"。有没有什么方法可以提取完整链接而不是部分链接？非常感谢。

使用urllib.parse.urljoin

from urllib.parse import urljoin
url = "https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(urljoin(url, link.get('href')))

您可以简单地使用if.

webroot = 'https://mhealthfairview.org'
href = link.get('href')
if href[0] == "/":
print(webroot + href)

如何修复从网页中提取完整链接的python代码?可用代码提取了部分链接

相关内容

最新更新

热门标签：