如何在s3中压缩大文件,然后下载



问候,我使用这种介质是因为我目前有一个问题:我需要通过lambda函数从s3下载文件,所有文件都压缩在.zip文件中。

我在下面显示的代码在生成的文件最大重量为5GB时有效,最近我尝试下载60个资产,每个资产500MB,生成的文件只有7个正确的文件,其他文件显示为损坏的文件。

算法完成了它的工作,但我认为,由于它们是流文件,lambda内存不足,这一事实也会产生影响,我想到的是将所有内容分成部分和块,但到目前为止,我还没有找到适合我的东西,这发生在某人身上了吗?请帮助

const archiver = require('archiver');
const aws = require('aws-sdk');
const stream = require('stream');
const REGION = debug ? 'myRegion' : process.env.REGION;
const FOLDER_LOCATION = debug ? 'myDownloads' : process.env.FOLDER_LOCATION;
const io = require('socket.io-client');
const s3 = new aws.S3({ apiVersion: '2006-03-01', region: REGION });
const { API_DEV, API_QA, API_PRO } = require('./constants');
let isSocketConencted = false;
let socket;
const main = async (uuid, asset, nameZip, client, arrayObjects, channel, env) => {
const api = env === 'dev'
? API_DEV
: env === 'qa'
? API_QA
: API_PRO;
socket = io(api, { path: '/sockets' });
socket.on('connect', () => {
console.log('socket conectado');
isSocketConencted = true;
});
socket.on('disconnect', () => {
console.log('socket desconectado');
isSocketConencted = false;
});
const bkt = env === 'dev'
? 'bunkey-develop'
: env === 'qa'
? 'bunkey-qa'
: 'bunkey-prod';
const s3DownloadStreams = arrayObjects.map(o => {
const [folder, fullName] = o.url.split('/').slice(o.url.split('/').length - 2);
const fileName = fullName.split('.')[0];
const ext = fullName.split('.')[1];
return {
stream: s3.getObject({ Bucket: bkt, Key: `${folder}/${fileName}.${ext}` }).createReadStream(),
filename: `${o.name}.${ext}`,
};
});
const streamPassThrough = new stream.PassThrough();
const params = {
ACL: 'public-read',
Body: streamPassThrough,
Bucket: bkt,
ContentType: 'application/zip',
Key: `${FOLDER_LOCATION}/${nameZip.replace(///g, '-')}.zip`,
StorageClass: 'STANDARD_IA',
};
const s3Upload = s3.upload(params, error => {
if (error) {
console.error(`Got error creating stream to s3 ${error.name} ${error.message} ${error.stack}`);
throw error;
}
});
const archive = archiver('zip', {
gzip: true,
zlib: {
level: 9,
}
});
archive.on('error', error => {
throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`);
});
new Promise((resolve, reject) => {
s3Upload.on('close', resolve);
s3Upload.on('end', resolve);
s3Upload.on('error', reject);
archive.pipe(streamPassThrough);
s3DownloadStreams.forEach(streamDetails => archive.append(streamDetails.stream, { name: streamDetails.filename }));
archive.finalize();
}).catch(async error => {
await handleSocketEmit(env, { uuid, channel, status: 'error', message: error.message });
throw new Error(`${error.message}`);
});
const result = await s3Upload.promise();
if (result && result.Location) {
await handleSocketEmit(env, { uuid, asset, status: 'success', client, nameZip, channel, url: result.Location });
await handleSocketDestroy();
return { statusCode: 200, body: result.Location };
} else {
await handleSocketEmit(env, { uuid, channel, status: 'error' });
await handleSocketDestroy();
return { statusCode: 500 };
}
};
const handleSocketDestroy = async () => {
socket.close();
socket.destroy();
};
const handleSocketEmit = async (env, msg) => {
try {
if (isSocketConencted) {
socket.emit('request_lambda_download', msg);
} else {
setTimeout(async () => {
await handleSocketEmit(env, msg);
}, 1000);
}
} catch (error) {
console.log('handleSocketEmit.err: ', error);
}
};
exports.handler = async (event) => {
const { uuid, asset, nameZip, client, arrayObjects, channel, env } = event;
const result = await main(uuid, asset, nameZip, client, arrayObjects, channel, env);
return result;
};

您的要求似乎是从Amazon S3下载未压缩的对象,创建一个zip,然后将zip上传回Amazon S3。

您的问题似乎源于这样一个事实,即Lambda函数正在流式传输内容,并在内存而不是磁盘中对其进行操作。AWS Lambda函数只分配了512MB的磁盘空间,这可能会使操作潜在的大文件变得困难。

如果你想继续使用AWS Lambda来完成这项工作,那么我建议:

  • 创建Amazon EFS文件系统
  • 将EFS文件系统连接到Lambda函数
  • 修改Lambda函数以将所有文件下载到EFS
  • Lambda函数可以创建本地文件的zip并将其上传到AmazonS3

这样可以避免所有的流和内存需求。它实际上可以以更低的(最小?(内存设置运行,这意味着Lambda函数的运行成本要低得多。

相关内容

最新更新