Python 3 web抓取发布时出现HTTP错误403



这是我尝试放置发布的URL时发生的错误的回溯。它适用于Stack Overflow或维基百科等常规网站,但当我在诸如https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub,出现错误。

这是我的代码:

req = Request(' https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub', headers={'User-Agent': 'Mozilla/5.0'})
html_plain = urlopen(req).read()

以下是错误的回溯:

File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 523, in open
response = meth(req, response)
File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 632, in http_response       
response = self.parent.error(
File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 561, in error
return self._call_chain(*args)
File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 494, in _call_chain
result = func(*args)
File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0liburllibrequest.py", line 641, in http_error_default  
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

HTTP 403 Forbidden客户端错误状态响应代码表示服务器理解请求但拒绝授权。这不是代码中的错误,这是网站在您进行网页抓取时拒绝为该页面提供服务器。

try:
# If you're getting a 403 response, use this. ("HTTP error occurred: 403 Client Error: Forbidden")
user_agent = 'Mozilla/5.0'
response = requests.get(url, headers={'User-Agent': user_agent})
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')  # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}')  # Python 3.6
else:

最新更新