BeautifulSoup断开链接检查器/网络爬虫



我正试图基于以下内容构建一个断开链接检查器如何:https://dev.to/arvindmehairjan/build-a-web-crawler-to-check-for-broken-links-with-python-beautifulsoup-39mg

然而,我的代码行有问题,因为当我运行程序时,我会收到以下错误消息:

File "/Users/Documents/brokenlinkchecker.py", line 26 print(f"Url: {link.get('href')} " + f"| Status Code: {response_code}") SyntaxError: invalid syntax

我一直在想是什么原因导致了这个语法错误。有人会对我能做些什么来让这个项目发挥作用提出建议吗?

非常感谢!

这是代码:

# Import libraries
from bs4 import BeautifulSoup, SoupStrainer
import requests
# Prompt user to enter the URL
url = input("Enter your url: ")
# Make a request to get the URL
page = requests.get(url)
# Get the response code of given URL
response_code = str(page.status_code)
# Display the text of the URL in str
data = page.text
# Use BeautifulSoup to use the built-in methods
soup = BeautifulSoup(data)
# Iterate over all links on the given URL with the response code next to it
for link in soup.find_all('a'):
print(f"Url: {link.get('href')} " + f"| Status Code: {response_code}")

您必须将额外的参数features="lxml"features="html.parser"传递给BeautifulSoup构造函数。

soup = BeautifulSoup(data,features="html.parser")

最新更新