使用puppeteer在登录后下载一个大的gzip文件



我有以下代码,基本上可以登录并导航到列出了我想要下载的文件的页面:

const getNextcloudDownloadUrl = async (): Promise<string> => {

const downloadUrl = `https://${BASEURL}${lastFile}`;
const fileName = downloadUrl.substring(downloadUrl.lastIndexOf('/') + 1);
const download = await page.evaluate((downloadUrl, fileName) => {
https.get(downloadUrl, res =>
{
const file = fs.createWriteStream(`/tmp/${fileName}`);
res.pipe(file);
file.on('finish', () => {
file.close();
console.log('done');
});
})
}, downloadUrl, fileName);
return downloadUrl;
};

我不能让它工作。因为Error: Evaluation failed: ReferenceError: https is not defined。我不能让它工作。我想下载一个500 MB的文件。我把所有的东西都看了一遍。尝试过提取,但据说这对流不起作用。

我尝试了以下资源,但无法解决:

  • 如何使用headless下载带有木偶师的文件:true
  • https://github.com/puppeteer/puppeteer/issues/299
  • https://oncletom.io/2018/puppeteer-download-file/
  • https://www.scrapingbee.com/blog/download-file-puppeteer/
  • https://docs.browserless.io/docs/downloading-files.html
  • https://help.apify.com/en/articles/1929322-handling-file-download-with-puppeteer

这是我在Chrome DevTools中复制它时的请求(但我后来发现由于流的原因,这不起作用(:

fetch(downloadUrl, {
"headers": {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6,es;q=0.5,sv;q=0.4,ru;q=0.3",
"sec-ch-ua": "" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": ""Windows"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "oc_sessionPassphrase=......."
},
"referrerPolicy": "no-referrer",
"body": null,
"method": "GET"
});

我已经让它以以下方式工作:

// Download file
const fileName = downloadUrl.substring(downloadUrl.lastIndexOf('/') + 1);
const cookies = await page.cookies();
const cStr = cookies.map((c: any) => `${c.name}=${c.value}`).join(';');
const fRes = fetch(downloadUrl, {
headers: {
accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6,es;q=0.5,sv;q=0.4,ru;q=0.3',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
cookie: cStr,
},
referrerPolicy: 'no-referrer',
body: null,
method: 'GET',
});

return await fRes
.then(
(res) =>
new Promise(async (resolve, reject) => {
const gcsFile = await uploadNextcloudFileToGoogleCloudStorage(fileName);
const dest = gcsFile.createWriteStream();
// @ts-ignore
res.body.pipe(dest);
// @ts-ignore
res.body.on('finish', () => resolve('it worked'));
dest.on('error', reject);
})
)
.then((x) => {
return {
status: 200,
downloadUrl,
fileName,
};
})
.catch((e) => {
return {
status: 400,
error: e,
};
});

然后,它会自动将文件上传到谷歌云存储,而无需将其存储在tmp中。

相关内容

最新更新