将s3位置的.txt文件压缩为.gz文件

我需要将.txt文件压缩到位于S3位置的.gz，然后将其上传到不同的S3存储桶。我已经写了以下代码，但它没有按预期工作：

def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):
with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
shutil.copyfileobj(fp, gz)
compressed_fp.seek(0)
print(compressed_fp)
bucket.upload_fileobj(
compressed_fp,
key,
{'ContentType': content_type, 'ContentEncoding': 'gzip'})
source_bucket = event['Records'][0]['s3']['bucket']['name']
file_key_name = event['Records'][0]['s3']['object']['key']
response = s3.get_object(Bucket=source_bucket, Key=file_key_name)
original = BytesIO(response['Body'].read())
original.seek(0)
upload_gzipped(source_bucket, file_key_name, original)

有人能在这里帮忙吗，或者任何其他方法来gzip S3位置上的文件

看起来您正在编写一个AWS Lambda函数。

一个更简单的程序流程可能是：

使用s3_client.download_file()将文件下载到/tmp/
Gzip文件
使用s3.client_upload_file()将文件上载到S3
删除/tmp/中的文件

此外，请注意，通过event传递多个对象时，可能会调用AWS Lambda函数。但是，您的代码目前只使用event['Records'][0]处理第一条记录。程序应该像这样循环这些记录：

for record in event['Records']:
source_bucket = record['s3']['bucket']['name']
file_key_name = record['s3']['object']['key']
...

与其将文件写入/tmp文件夹，不如将其读取到缓冲区中，因为/tmp文件夹的内存有限。

buffer = BytesIO(file.get()["Body"].read())

对于gzip，你可以简单地使用这样的东西：

gzipped_content = gzip.compress(f_in.read())
destinationbucket.upload_fileobj(io.BytesIO(gzipped_content),
final_file_path,
ExtraArgs={"ContentType": "text/plain"}
)

这里有一个类似的Lambda函数教程：https://medium.com/p/f7bccf0099c9

相关内容

最新更新

热门标签：