在抓取web后,我立即得到错误http.client.RemoteDisconnected



我尝试这个代码:

import gspread
import requests
import datetime 
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
from datetime import timedelta
datetime.datetime.now()
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
URL = 'https://colnect.com/cs/coins/list/country/57-%C4%8Cesk%C3%A1_republika/series/76375-1993~sou%C4%8Dasnost_-_ob%C4%9B%C5%BEn%C3%A9'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
#Google sheet
data = ServiceAccountCredentials.from_json_keyfile_name("data.json", scope)
client = gspread.authorize(data)
sheet = client.open("skript").worksheet('ColnectTest')
data = sheet.get_all_records()
#Scraping
results = soup.find_all('div', attrs={'class':'pl-it'})
for job_data in results:

mince = job_data.find('h2', attrs={"class":"item_header"})
mince_final = mince.text.strip()
#přidání řádku do sheetu
insertRow = ["colnect.cz", mince_final]
sheet.insert_row(insertRow,2)

但我立即收到这个错误消息:

Traceback (most recent call last):
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 670, in urlopen      
httplib_response = self._make_request(
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "C:PythonPython38-32libhttpclient.py", line 1347, in getresponse
response.begin()
File "C:ProgramsPythonPython38-32libhttpclient.py", line 307, in begin
version, status, reason = self._read_status()
File "C:ProgramsPythonPython38-32libhttpclient.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:ProgramsPythonPython38-32libsite-packagesrequestsadapters.py", line 439, in send
resp = conn.urlopen(
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "C:ProgramsPythonPython38-32libsite-packagesurllib3utilretry.py", line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:ProgramsPythonPython38-32libsite-packagesurllib3packagessix.py", line 734, in reraise
raise value.with_traceback(tb)
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "C:ProgramsPythonPython38-32libsite-packagesurllib3connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "C:ProgramsPythonPython38-32libhttpclient.py", line 1347, in getresponse
response.begin()
File "C:ProgramsPythonPython38-32libhttpclient.py", line 307, in begin
version, status, reason = self._read_status()
File "C:ProgramsPythonPython38-32libhttpclient.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/Projekty/python/skript/t.py", line 17, in <module>
page = requests.get(URL)
File "C:ProgramsPythonPython38-32libsite-packagesrequestsapi.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:ProgramsPythonPython38-32libsite-packagesrequestsapi.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:ProgramsPythonPython38-32libsite-packagesrequestssessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "C:ProgramsPythonPython38-32libsite-packagesrequestssessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "C:ProgramsPythonPython38-32libsite-packagesrequestsadapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

我的代码出了什么问题?我对其他网页使用相同的代码,这对他们来说是可以的。在我这边有什么解决方案吗?或者在网络服务器端有块(有些阻止(?

我想把网页上的一些数据插入谷歌工作表。我尝试h2 class=item_header中的第一个元素来获取硬币的名称,并在成功插入后继续其他元素。

您需要指定User-Agent才能从服务器获得正确的响应,例如:

import requests
from bs4 import BeautifulSoup

headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
URL = "https://colnect.com/cs/coins/list/country/57-%C4%8Cesk%C3%A1_republika/series/76375-1993~sou%C4%8Dasnost_-_ob%C4%9B%C5%BEn%C3%A9"
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
print(soup.title)

打印:

<title>Česká republika : Mince [Série: 1993~současnost - oběžné] [1/2]</title>

相关内容

最新更新