您无权访问此资源 Python 网页抓取



我正在尝试web抓取一个网站,当我这样做时,我得到的输出低于。有没有办法我可以刮这个网站?

url = "https://www.mustang6g.com/forums/threads/pre-collision-alert-system.132807/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)

上述代码的输出如下

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
</body></html>

网站服务器期望传递一个标头:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/75.0.3770.80 Safari/537.36'}
URL = 'https://www.mustang6g.com/forums/threads/pre-collision-alert-system.132807/'

httpx = requests.get(URL, headers=headers)
print(httpx.text)

通过传递头,我们告诉服务器我们是Mozilla:(