使用python 3 HTTP代码进行Web抓取错误



所以我对编码很陌生,我只是学习web抓取。

我不知道该怎么办,这可能是最基本的东西。但又不确定我在这里做错了什么对你们中的一些人来说,这可能是一个非常简单的解决方案。非常重视任何帮助

from urllib.request import urlopen as uReq
dcgp_url = "http://news.formulad.com/"
uClient = uReq(dcgp_url)
page_html = uClient.read
uClient.close()

然后它向我显示了一个错误:

C:UsersmateuAppDataLocalProgramsPythonPython38-32python.exe "E:/Discord Bot/Web scraping.py"
Traceback (most recent call last):
File "E:/Discord Bot/Web scraping.py", line 7, in <module>
uClient = uReq(dcgp_url)
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 531, in open
response = meth(req, response)
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 640, in http_response
response = self.parent.error(
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 569, in error
return self._call_chain(*args)
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 502, in _call_chain
result = func(*args)
File "C:UsersmateuAppDataLocalProgramsPythonPython38-32liburllibrequest.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process finished with exit code 1

如这里和这里所解释的,您要访问的网站拒绝不标识用户代理的GET请求。你可以通过在谷歌中搜索"我的用户代理"来找到你的用户代理。

以下代码应该有效:

from urllib.request import urlopen, Request
dcgp_url = "http://news.formulad.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
req = Request(url=dcgp_url, headers=headers) 
uClient = urlopen(req)
page_html = uClient.read()
uClient.close()

最新更新