从验证码图像中获取"src",在<img>无头铬中与木偶师



我的最终目标是获得页面中验证码图像的base64文本。该页中的代码有

</div>
<div _ngcontent-qjw-c117 class="mb-16">
<img _ngcontent-qjw-c117 alt="Image verification" width="100" height="50" src="data:image/png;base64,iVBOR...AAEOWcJLXLQAAAABJRU5ErkJggg==">
</div>

在Chrome的控制台,以下工作正常:

var yes = document.getElementsByClassName("mb-16")[1].firstElementChild.src;

这太好了。现在我想用木偶来做这件事。

在Puppeteer中,我有以下代码:

import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('theURL');
const element = await page.$('document.getElementsByClassName("mb-16")[1].firstElementChild.src;');
console.log(element);
await browser.close();
})();

这失败:

$ node index.js
file:///Users/.../node_modules/puppeteer-core/lib/esm/puppeteer/common/ExecutionContext.js:225
throw new Error('Evaluation failed: ' + getExceptionMessage(exceptionDetails));
^
Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'document.getElementsByClassName("mb-16")[1].firstElementChild.src;' is not a valid selector.
at pptr://__puppeteer_evaluation_script__:5:24
at ExecutionContext._ExecutionContext_evaluate (file:///Users/.../node_modules/puppeteer-core/lib/esm/puppeteer/common/ExecutionContext.js:225:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async ElementHandle.evaluateHandle (file:///Users/...node_modules/puppeteer-core/lib/esm/puppeteer/common/JSHandle.js:94:16)
at async internalHandler.queryOne (file:///Users.../node_modules/puppeteer-core/lib/esm/puppeteer/common/QueryHandler.js:25:30)
at async ElementHandle.$ (file:///.../node_modules/puppeteer-core/lib/esm/puppeteer/common/ElementHandle.js:93:17)
at async file:///Users/..../index.js:7:19
Node.js v18.12.1

如何从Puppeteer的<img元素中获得src?我不高兴地复习了其他类似的问题。

import puppeteer from 'puppeteer';

(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('theURL', {
waitUntil: 'networkidle0', // This will solve your issue
});
// ALWAYS USE return INSIDE evaluate BECAUSE IT HAPPENS IN THE DOM AND WE NEED TO RETURN IT TO puppeteer
const element = await page.evaluate(() => { 
const element = document.getElementsByClassName("mb-16")[1].firstElementChild.src;
return element
});
console.log(element);
await browser.close();
})();

更多信息:

Page.waitForNetworkIdle()方法- Puppeteer

最新更新