Python获取类中找到的每个ahref链接的标题状态，并打印带有状态代码的链接

我试图提取在某个类的html中找到的所有href链接，并将它们与服务器标头状态一起打印。

要找到每个ahref链接，我有以下

for href in soup.find_all('section', class_='holder'):
    for a in href.find_all('a'):
        if a.get('href') == '/':
            continue
        else:
            print(a.get('href'))

这可以打印所有的url，但我也想打印每个url旁边的服务器头状态。

我试过这样的东西，但不起作用：

for href in soup.find_all('section', class_='holder'):
    for a in href.find_all('a'):
        headers = requests.head('a')
        if a.get('href') == '/':
            continue
        else:
            print(a.get('href'), (headers))

我想要的输出是：

https://www.exampleurlone.com/urlone 200
https://www.exampleurlone.com/urltwo 200
https://www.exampleurlone.com/urlthree 404

这能做到吗？

您可能需要status_code。

例如：

for href in soup.find_all('section', class_='holder'):
    for a in href.find_all('a'):
        if a.get('href') == '/':
            continue
        else:
            headers = requests.head(a.get('href'))
            print(a.get('href'), (headers.status_code))

相关内容

最新更新

热门标签：