从.gz扩展名中提取压缩文件,同时从ftp服务器下载



我创建了一个函数,可以从给定的ftp服务器下载.gz文件,我想在下载时实时提取它们,然后删除压缩文件。我该怎么做?

sinex_domain = "ftp://cddis.gsfc.nasa.gov/gnss/products/bias/2013"
def download(sinex_domain):
user = getpass.getuser()
sinex_parse = urlparse(sinex_domain)
sinex_connetion = FTP(sinex_parse.netloc)
sinex_connetion.login()
sinex_connetion.cwd(sinex_parse.path)
sinex_files = sinex_connetion.nlst()
sinex_userpath = "C:\Users\" + user + "\DCBviz\sinex"
pathlib.Path(sinex_userpath).mkdir(parents=True, exist_ok=True)
for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
file = open(local_filename, 'wb')
sinex_connetion.retrbinary('RETR '+ fileName, file.write, 1024)

#want to extract files in this loop
file.close()
sinex_connetion.quit()
download(sinex_domain)

尽管可能有一种更聪明的方法可以避免将每个文件的整个数据存储在内存中,但这些文件似乎都很小(几十KB未压缩(,因此将压缩数据读取到BytesIO缓冲区中,然后在将其写入输出文件之前在内存中解压缩就足够了。(压缩数据从未保存到磁盘。(

你可以添加这些进口:

import gzip
from io import BytesIO

然后你的主循环变成:

for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
if local_filename.endswith('.gz'):
local_filename = local_filename[:-3]
data = BytesIO()
sinex_connetion.retrbinary('RETR '+ fileName, data.write, 1024)
data.seek(0)
uncompressed = gzip.decompress(data.read())
with open(local_filename, 'wb') as file:
file.write(uncompressed)

(注意,不需要file.close()。(

最新更新