我正在使用tkinter和请求制作这个应用程序,它应该像下载管理器一样。我正在使用请求,最近我发现requests.get(url)
函数中的stream关键字参数可以在下载时写下内容。我的问题是,当用户下载多个文件或只是大文件时,请求似乎就会停止。奇怪的是,它不会像预期的行为那样引发错误。为什么会发生这种情况?我该如何解决此问题?没有GUI的下载的简单版本(我发现这个特定的url有点问题(:
import requests
import time
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
# headers = r.headers
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
更新:我检查了另外两个stackoverflow问题,这些问题可能有答案,但显然还有一些问题没有得到回答(链接:python请求中断的流式下载大文件,链接:python';s file.flush((到底在做什么?(。我尝试使用这两个功能作为问题的解决方案,但有些下载仍然停止。新版本的代码:
import requests
import time
import os
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096):
downloaded += fp.write(chunk)
# Added the 'flush' and 'fsync' function as mentioned in the issues
fp.flush()
os.fsync(fp.fileno())
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
即使添加了这两个功能,请求似乎也会停止。我怀疑请求有时无法保持连接,因为在一天中的某些时间,当我的互联网不那么强大时,这个问题发生得最多,但我不明白为什么它不会引发像urllib这样的错误。如果不是这样的话,我该怎么解决呢?
我做了3个更改,其中只有一个直接影响结果。
- 我添加了
r.raise_for_status()
来检查是否有任何错误,这是一种很好的做法 - 我使用
name = url.split('/')[-1].split('?')[0]
作为文件名,它产生了'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4'
,这可能是您想要的,因为它具有正确的扩展名 - 我把
chunk_size
增加了64倍,这可能就是诀窍所在
import requests
import time
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
r.raise_for_status() # check for errors
# headers = r.headers
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = int(headers['Content-Length'])
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096 * 64):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
打印:
{'Accept-Ranges': 'bytes', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Server,range,Content-Length,Content-Range', 'Cache-Control': 'max-age=8640000', 'Content-Length': '101751914', 'Content-Type': 'video/mp4', 'Date': 'Sat, 21 Nov 2020 18:04:48 GMT', 'Etag': '"5e379fa7-6109c6a"', 'Expires': 'Mon, 01 Mar 2021 18:04:48 GMT', 'Last-Modified': 'Sun, 19 Nov 2000 08:52:00 GMT'}
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
Download 0 % done, avg speed 249 kbps
Download 3 % done, avg speed 1386 kbps
Download 11 % done, avg speed 3443 kbps
Download 19 % done, avg speed 4525 kbps
Download 28 % done, avg speed 5399 kbps
Download 38 % done, avg speed 6218 kbps
Download 50 % done, avg speed 6997 kbps
Download 63 % done, avg speed 7763 kbps
Download 78 % done, avg speed 8463 kbps
Download 89 % done, avg speed 8733 kbps
我应该补充一点,它也适用于4096的原始chunk_size
,尽管要慢得多。老实说,我不能给你一个确切的理由,为什么它挂在你身上,但当然没有理由不尝试我建议的更大(但不是不合理的大(的chunk_size
更新
我已经试着运行了好几次代码,发现性能变化很大。尽管指定了chunk_size
,但代码似乎最终以更小的块进行迭代。这是一个示例运行,尽管如此它还是完成了:
{'Accept-Ranges': 'bytes', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Server,range,Content-Length,Content-Range', 'Cache-Control': 'max-age=8640000', 'Content-Length': '101751914', 'Content-Type': 'video/mp4', 'Date': 'Sat, 21 Nov 2020 19:14:13 GMT', 'Etag': '"5e379fa7-6109c6a"', 'Expires': 'Mon, 01 Mar 2021 19:14:13 GMT', 'Last-Modified': 'Sun, 19 Nov 2000 08:52:00 GMT'}
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
Download 0 % done, avg speed 243 kbps
Download 3 % done, avg speed 1240 kbps
Download 12 % done, avg speed 3803 kbps
Download 19 % done, avg speed 4484 kbps
Download 24 % done, avg speed 4601 kbps
Download 29 % done, avg speed 4615 kbps
Download 33 % done, avg speed 4503 kbps
Download 37 % done, avg speed 4411 kbps
Download 40 % done, avg speed 4126 kbps
Download 42 % done, avg speed 3907 kbps
Download 44 % done, avg speed 3674 kbps
Download 45 % done, avg speed 3462 kbps
Download 46 % done, avg speed 3238 kbps
Download 47 % done, avg speed 3061 kbps
Download 47 % done, avg speed 2913 kbps
Download 48 % done, avg speed 2753 kbps
Download 49 % done, avg speed 2613 kbps
Download 49 % done, avg speed 2504 kbps
Download 50 % done, avg speed 2396 kbps
Download 50 % done, avg speed 2286 kbps
Download 51 % done, avg speed 2190 kbps
Download 52 % done, avg speed 2108 kbps
Download 52 % done, avg speed 2035 kbps
Download 53 % done, avg speed 1975 kbps
Download 53 % done, avg speed 1907 kbps
Download 54 % done, avg speed 1859 kbps
Download 55 % done, avg speed 1831 kbps
Download 56 % done, avg speed 1796 kbps
Download 57 % done, avg speed 1759 kbps
Download 58 % done, avg speed 1724 kbps
Download 60 % done, avg speed 1693 kbps
Download 60 % done, avg speed 1663 kbps
Download 61 % done, avg speed 1633 kbps
Download 62 % done, avg speed 1605 kbps
Download 63 % done, avg speed 1580 kbps
Download 64 % done, avg speed 1555 kbps
Download 65 % done, avg speed 1536 kbps
Download 65 % done, avg speed 1515 kbps
Download 66 % done, avg speed 1496 kbps
Download 67 % done, avg speed 1476 kbps
Download 68 % done, avg speed 1456 kbps
Download 69 % done, avg speed 1438 kbps
Download 70 % done, avg speed 1421 kbps
Download 70 % done, avg speed 1405 kbps
Download 71 % done, avg speed 1391 kbps
Download 72 % done, avg speed 1372 kbps
Download 73 % done, avg speed 1357 kbps
Download 73 % done, avg speed 1344 kbps
Download 74 % done, avg speed 1330 kbps
Download 75 % done, avg speed 1320 kbps
Download 76 % done, avg speed 1310 kbps
Download 77 % done, avg speed 1297 kbps
Download 78 % done, avg speed 1289 kbps
Download 79 % done, avg speed 1284 kbps
Download 80 % done, avg speed 1279 kbps
Download 81 % done, avg speed 1275 kbps
Download 83 % done, avg speed 1272 kbps
Download 84 % done, avg speed 1271 kbps
Download 85 % done, avg speed 1270 kbps
Download 87 % done, avg speed 1269 kbps
Download 88 % done, avg speed 1265 kbps
Download 89 % done, avg speed 1260 kbps
Download 89 % done, avg speed 1252 kbps
Download 90 % done, avg speed 1244 kbps
Download 91 % done, avg speed 1237 kbps
Download 92 % done, avg speed 1230 kbps
Download 92 % done, avg speed 1224 kbps
Download 94 % done, avg speed 1214 kbps
Download 95 % done, avg speed 1204 kbps
Download 95 % done, avg speed 1195 kbps
Download 96 % done, avg speed 1186 kbps
Download 97 % done, avg speed 1177 kbps
Download 98 % done, avg speed 1168 kbps
Download 98 % done, avg speed 1160 kbps
Download 99 % done, avg speed 1151 kbps
Download 100 % done, avg speed 1144 kbps
使用urllib3
的版本
import urllib3
import time
http = urllib3.PoolManager()
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
r = http.request('HEAD', url)
headers = r.headers
print(headers)
r = http.request('GET', url, preload_content=False)
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = int(headers['Content-Length'])
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
for chunk in r.stream(4096 * 64):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
r.release_conn()
使用urllib
的版本
import time
from urllib.request import urlopen
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
response = urlopen(url)
file_size = int(response.getheader('Content-Length'))
print('File size =', file_size)
name = url.split('/')[-1].split('?')[0]
print(name)
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
while True:
chunk = response.read(4096 * 64)
if not chunk:
break
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
使用urllib
和urlretrieve
的版本
from urllib.request import urlretrieve
import time
def report_hook(numblocks, blocksize, file_size):
global start, last_print
now = time.time()
if now - last_print >= 1:
downloaded = numblocks * blocksize
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = now
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
start = time.time()
last_print = start
urlretrieve(url, name, report_hook)
使用wget
的版本
CCD_ 14具有很强的鲁棒性。如果你在Windows上,你可以在这里下载一个版本。第一个版本从wget
获得管道stderr输出,并显示每一行,看起来像:
--2020-11-24 09:20:02-- https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY
Resolving aspb2.cdn.asset.aparat.com (aspb2.cdn.asset.aparat.com)... 91.229.46.35
Connecting to aspb2.cdn.asset.aparat.com (aspb2.cdn.asset.aparat.com)|91.229.46.35|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 101751914 (97M) [video/mp4]
Saving to: 'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4'
0K ........ ........ ........ ........ ........ ........ 3% 509K 3m9s
3072K ........ ........ ........ ........ ........ ........ 6% 1.36M 2m5s
6144K ........ ........ ........ ........ ........ ........ 9% 1.94M 96s
9216K ........ ........ ........ ........ ........ ........ 12% 1.14M 88s
12288K ........ ........ ........ ........ ........ ........ 15% 931K 86s
15360K ........ ........ ........ ........ ........ ........ 18% 970K 83s
18432K ........ ........ ........ ........ ........ ........ 21% 1.28M 77s
21504K ........ ........ ........ ........ ........ ........ 24% 1.90M 69s
24576K ........ ........ ........ ........ ........ ........ 27% 2.64M 62s
27648K ........ ........ ........ ........ ........ ........ 30% 2.87M 56s
30720K ........ ........ ........ ........ ........ ........ 34% 2.07M 51s
33792K ........ ........ ........ ........ ........ ........ 37% 1.30M 49s
36864K ........ ........ ........ ........ ........ ........ 40% 713K 49s
39936K ........ ........ ........ ........ ........ ........ 43% 731K 49s
43008K ........ ........ ........ ........ ........ ........ 46% 663K 48s
46080K ........ ........ ........ ........ ........ ........ 49% 657K 48s
49152K ........ ........ ........ ........ ........ ........ 52% 1.01M 45s
52224K ........ ........ ........ ........ ........ ........ 55% 1.76M 41s
55296K ........ ........ ........ ........ ........ ........ 58% 1.49M 37s
58368K ........ ........ ........ ........ ........ ........ 61% 1.32M 34s
61440K ........ ........ ........ ........ ........ ........ 64% 1.20M 31s
64512K ........ ........ ........ ........ ........ ........ 68% 966K 29s
67584K ........ ........ ........ ........ ........ ........ 71% 977K 26s
70656K ........ ........ ........ ........ ........ ........ 74% 857K 24s
73728K ........ ........ ........ ........ ........ ........ 77% 803K 21s
76800K ........ ........ ........ ........ ........ ........ 80% 753K 19s
79872K ........ ........ ........ ........ ........ ........ 83% 842K 16s
82944K ........ ........ ........ ........ ........ ........ 86% 1.14M 13s
86016K ........ ........ ........ ........ ........ ........ 89% 1.79M 10s
89088K ........ ........ ........ ........ ........ ........ 92% 2.21M 7s
92160K ........ ........ ........ ........ ........ ........ 95% 2.19M 4s
95232K ........ ........ ........ ........ ........ ........ 98% 2.45M 1s
98304K ........ ........ 100% 2.54M=88s
2020-11-24 09:21:31 (1.10 MB/s) - 'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4' saved [101751914/101751914]
来源:
import subprocess
import time
def run_wget(url, outfile):
cmd = ['wget', '--progress=dot:mega', '-O', outfile, url]
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
for stderr_line in iter(p.stderr.readline, ""):
yield stderr_line
p.stderr.close()
return_code = p.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
start = time.time()
for line in run_wget(url, name):
print(line, end='')
print('Total time:', time.time() - start)
第二个版本,处理输出以生成类似于其他解决方案的列表:
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
file size = 101751914
Download 3% done, avg speed 384 kbps
Download 6% done, avg speed 570 kbps
Download 9% done, avg speed 672 kbps
Download 12% done, avg speed 703 kbps
Download 15% done, avg speed 732 kbps
Download 18% done, avg speed 784 kbps
Download 21% done, avg speed 857 kbps
Download 24% done, avg speed 895 kbps
Download 27% done, avg speed 884 kbps
Download 30% done, avg speed 868 kbps
Download 34% done, avg speed 885 kbps
Download 37% done, avg speed 818 kbps
Download 40% done, avg speed 818 kbps
Download 43% done, avg speed 849 kbps
Download 46% done, avg speed 885 kbps
Download 49% done, avg speed 920 kbps
Download 52% done, avg speed 929 kbps
Download 55% done, avg speed 937 kbps
Download 58% done, avg speed 946 kbps
Download 61% done, avg speed 957 kbps
Download 64% done, avg speed 878 kbps
Download 68% done, avg speed 696 kbps
Download 71% done, avg speed 611 kbps
Download 74% done, avg speed 564 kbps
Download 77% done, avg speed 550 kbps
Download 78% done, avg speed 543 kbps
Download 80% done, avg speed 526 kbps
Download 83% done, avg speed 534 kbps
Download 86% done, avg speed 542 kbps
Download 89% done, avg speed 548 kbps
Download 92% done, avg speed 553 kbps
Download 95% done, avg speed 556 kbps
Download 98% done, avg speed 557 kbps
Download 100% done, avg speed 563 kbps
Total time: 176.51619601249695
来源:
import subprocess
import time
import re
def run_wget(url, outfile):
cmd = ['wget', '--progress=dot:mega', '-O', outfile, url]
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
for stderr_line in iter(p.stderr.readline, ""):
yield stderr_line
p.stderr.close()
return_code = p.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = None
start = time.time()
for line in run_wget(url, name):
if file_size is None:
m = re.match(r'Length: (d+)', line)
if m:
file_size = int(m[1])
print('file size =', file_size)
else:
m = re.search(r'(d+)%', line)
if m:
pct_done = int(m[1])
downloaded = file_size / 100 * pct_done
elapsed = time.time() - start
speed = round(downloaded / elapsed / 1024)
print(f"Download {pct_done}% done, avg speed {speed} kbps")
print('Total time:', time.time() - start)
此问题可能发生在服务器端,如果客户端连接太慢,服务器可能会关闭请求。当我在慢速移动连接上用python下载大文件时,我就遇到过这种情况。我建议修改服务器的代码库或访问更好的连接,尽管之前的连接是更好的长期解决方案。
好运