如何在s3中解压缩压缩文件



我在s3存储桶的文件夹中有一个压缩文件。我想使用boto3解压缩文件。到目前为止,这是我的代码。

def unzip_file(path, file_name):
s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
my_bucket = s3.Bucket(BUCKET)
lst = list(my_bucket.objects.filter(Prefix=path))
unzip_path = '/'.join(str(lst[0].key).split('/')[:-1])
with zipfile.ZipFile(f"{path}/{file_name}", 'r') as zip_ref:
zip_ref.extractall(unzip_path)

但这只是给出了一个错误,如下

Traceback (most recent call last):
File "download.py", line 153, in <module>
unzip_file(path, file_name)
File "download.py", line 32, in unzip_file
with zipfile.ZipFile(f"{path}/{file_name}", 'r') as zip_ref:
File "/Users/sashaanksekar/anaconda3/lib/python3.8/zipfile.py", line 1250, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'test_parent/test_num/test.zip'

如何使用python和boto3解压缩文件?

[编辑1]

我已经编辑了代码,这样压缩后的文件现在就在内存中了。如何将所有文件提取到S3中。

这是我的代码,现在

def unzip_file(r, path, file_name):
s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
my_bucket = s3.Bucket(BUCKET)
if r.status_code == 200:
filebytes = BytesIO(r.content)
file = zipfile.ZipFile(filebytes)
extract_folder = f"{path}extract_test/"

# extract each file in file.namelist() and save in extract_folder here

由于我不确定r.content是什么以及函数背后的逻辑,我提供了一个工作示例

import zipfile
from io import BytesIO
import boto3
BUCKET='my-bucket'
key='my.zip'
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(BUCKET)
# mem buffer
filebytes = BytesIO()
# download to the mem buffer
my_bucket.download_fileobj(key, filebytes)
# create zipfile obj
file = zipfile.ZipFile(filebytes)
# extact
file.extractall('/tmp/extract_test')

上面答案中提到的/tmp文件夹可能可以工作,但该文件夹的内存有限,如果有更大的压缩文件,您的功能可能无法正常工作。你可以这样做:

zipped_file = s3_resource.Object(bucket_name=sourcebucketname, key=filekey)
buffer = BytesIO(zipped_file.get()["Body"].read())
zipped = zipfile.ZipFile(buffer)

for file in zipped.namelist():
logger.info(f'current file in zipfile: {file}')
final_file_path = file + '.file_extension'
with zipped.open(file, "r") as f_in:
content = f_in.read()
destinationbucket.upload_fileobj(io.BytesIO(content),
final_file_path,
ExtraArgs={"ContentType": "text/plain"}
)

解压缩后,您可以上传回另一个S3文件夹或某个本地文件夹。以下是完整的工作代码:https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9

最新更新