python从网站下载.目标机器主动拒绝它



我对网络抓取的了解还不够。有一个网站我想从中获取数据。我收到错误

[WinError 10061]由于目标计算机主动拒绝,因此无法建立连接

因此,当你访问url网站时,它应该根据url参数下载数据。我还想指定文件保存到的文件路径。

我试过

import urllib.request
from datetime import date
today = date.today()
str_day = today.strftime("%Y-%m-%d")
url = f"http://data.usbr.gov/rise/api/result/download?after=2020-05-28&before={str_day}&filename=Lake%20Mead%20Hoover%20Dam%20and%20Powerplant%20Daily%20Lake%2FReservoir%20Storage-af%20Time%20Series%20Data&itemId=6124&order=ASC&type=csv"
opener = urllib.request.urlopen(url)
file_out = open("download.csv", "wb")
file_out.write(opener.readall())
file_out.close()

我也试过了。

import requests
from datetime import date
today = date.today()
str_day = today.strftime("%Y-%m-%d")
url = f"http://data.usbr.gov/rise/api/result/download?after=2020-05-28&before={str_day}&filename=Lake%20Mead%20Hoover%20Dam%20and%20Powerplant%20Daily%20Lake%2FReservoir%20Storage-af%20Time%20Series%20Data&itemId=6124&order=ASC&type=csv"
r = requests.get(url)

URL使用HTTPS。该错误意味着没有进程正在侦听端口(HTTP端口80(。您还可以使用参数字典来提高参数的可读性:

import requests
from datetime import date
today = date.today()
str_day = today.strftime("%Y-%m-%d")
url = 'https://data.usbr.gov/rise/api/result/download'
params = {'after': '2020-05-28',
'before': str_day,
'filename': 'Lake Mead Hoover Dam and Powerplant Daily Lake/Reservoir Storage-af Time Series Data',
'itemId': 6124,
'order': 'ASC',
'type': 'csv'}
r = requests.get(url, params)
print(r)
with open('download.csv', 'w', encoding='utf8') as f:
f.write(r.text)

输出:

<Response [200]>

与@MarkTolonen提供的优秀答案非常相似,但增加了对HTTP状态的检查,并使用流式传输,这可能对大量数据更有效。

import requests
from datetime import date
params = {}
params['after'] = '2020-05-28'
params['before'] = date.today().strftime("%Y-%m-%d")
params['filename'] = 'Lake Mead Hoover Dam and Powerplant Daily Lake/Reservoir Storage-af Time Series Data'
params['order'] = 'ASC'
params['type'] = 'csv'
params['itemId'] = 6124
url = "https://data.usbr.gov/rise/api/result/download"
(r := requests.get(url, params=params, stream=True)).raise_for_status()
with open('data.csv', 'wb') as csv:
for chunk in r.iter_content(chunk_size=4096):
csv.write(chunk)

相关内容

最新更新