如何同时下载多个文件部件?



我正在开发一个下载加速器,我想加快下载速度,但当我得到一些服务器的报头响应时,我没有Content-length字段,我不知道如何分割下载以在许多套接字上启动它(我想我会使用select)这是我的代码

def parse_chunk(chunk: bytes, current_chunk_length_remaining: int) -> tuple[bytes, int, bytes]:
""""
this function receive a chunk and parse it following rules of RFC 2616 section 3.6.1
return
real_data | current_chunk: bytes
chunk_length_remaining: int
remaining_data: bytes
link for RFC 2616 https://datatracker.ietf.org/doc/html/rfc2616#section-3.6.1
"""
# This is the first chunk or the previous chunk ended with the previous packet received
if current_chunk_length_remaining == 0:
chunk_header, chunk_body = chunk.split(b'rn', 1)
try:
response_length, chunk_extension = chunk_header.split(b';')
except ValueError:
response_length = chunk_header.split(b';')[0]
chunk_extension = b''
#  response_length.decode('utf-8')
response_length = int(response_length, base=16)  # convert from HEX to int
real_data = chunk_body[:response_length]
remaining_data = chunk_body[response_length:]
if len(real_data) < response_length:  # This means that the chunk is not complete
chunk_length_remaining = response_length - len(chunk_body)
else:  # They are equals
chunk_length_remaining = 0
return real_data, chunk_length_remaining, remaining_data
# The previous packet received didn't have the current chunk in totality
elif current_chunk_length_remaining > 0:
current_chunk = chunk[:current_chunk_length_remaining]
remaining_data = chunk[current_chunk_length_remaining:]
if len(current_chunk) < current_chunk_length_remaining:  # This means that the chunk is not complete
chunk_length_remaining = current_chunk_length_remaining - len(current_chunk)
else:  # They are equals
chunk_length_remaining = 0
return current_chunk, chunk_length_remaining, remaining_data
def parse_http_get_response(server_interface: socket, link: str, target_host: str, user_agent: str) 
-> tuple[bytes, bytes]:
request = "GET /" + link + 
" HTTP/1.1rnHost:{}rnUser-Agent: {}rnConnection: closernrn".format(target_host, user_agent)
server_interface.send(request.encode('utf-8'))
data = server_interface.recv(BUFFER_SIZE)
http_headers, response_body = data.split(b'rnrn', 1)
return http_headers, response_body
def download_chunk_data(server_interface: socket, path_to_file: str, link: str, target_host: str,
user_agent: str):
_, response_body = parse_http_get_response(server_interface, link, target_host, user_agent)
real_data, chunk_length_remaining, remaining_data = parse_chunk(response_body, 0)
is_start = True
with open(path_to_file, "wb") as res:
is_file_end = False
end_of_file = re.compile(b'rn0rnrn')
data = real_data
while True:
if not is_start:
real_data, chunk_length_remaining, remaining_data = parse_chunk(data, chunk_length_remaining)
tmp = b''
if len(remaining_data) == 0:
tmp = real_data
if end_of_file.match(remaining_data):
tmp = real_data
is_file_end = True
while chunk_length_remaining != 0 and len(remaining_data) != 0:
tmp += real_data
real_data, chunk_length_remaining, remaining_data = 
parse_chunk(remaining_data, chunk_length_remaining)
data = tmp
res.write(data)
if not data or is_file_end:
server_interface.close()
break
data = server_interface.recv(BUFFER_SIZE)
is_start = False
targetHost = "www.livrespourtous.com"
targetPort = 80
BUFFER_SIZE = 1024
userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 " 
"Safari/537.36 "
testLink = "http://www.livrespourtous.com/e-books/detail/Le-PDF-gratuit-et-libre/onecat/0.html"
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client:
client.connect((targetHost, targetPort))
download_chunk_data(client, "response.html", testLink, targetHost, userAgent)

我使用download_chunk_data函数下载,如果传输类型是块化的

但是当我得到一些服务器的报头响应时,我没有Content-length字段

此特定服务器使用分块传输编码。长度是不知道的,这就是为什么不可能弄清楚如何分割响应。这是动态生成文件的典型情况。

但是这实际上在这种情况下根本不相关,因为服务器无论如何都不支持范围(至少对于这个URL),即响应中没有Accept-Ranges标头。这意味着每个请求都将从数据的开头开始,没有办法指示服务器否则。

相关内容

最新更新