Puppeteer语言 - 在 page.select() 之后等待网络请求完成



有没有办法在对页面执行操作后等待网络请求解析,然后再在Puppeteer中执行新操作?

我需要使用page.select()与页面上的选择菜单进行交互,这会导致动态图像和字体加载到页面中。我需要等待这些请求完成,然后再执行下一个操作。

--

警告:

  1. 我无法重新加载页面或转到新网址。
  2. 我不知道请求类型可能是什么,或者有多少

--

// launch puppeteer
const browser = await puppeteer.launch({});
// load new page
const page = await browser.newPage();
// go to URL and wait for initial requests to resolve
await page.goto(pageUrl, {
waitUntil: "networkidle0"
});
// START LOOP
for (let value of lotsOfValues) {
// interact with select menu
await page.select('select', value);
// wait for network requests to complete (images, fonts)
??
// screenshot page with new content
await pageElement.screenshot({
type: "jpeg",
quality: 100
});
} // END LOOP
// close
await browser.close();

答案在于使用page.setRequestInterception(true);并监视后续请求,等待它们在继续下一个任务之前重新创建(感谢@Guarev正确的方向(。

此模块 (https://github.com/jtassin/pending-xhr-puppeteer( 正是这样做的,但对于 XHR 请求。我修改了它以查找"图像"和"字体"类型。

最终代码如下所示:

// launch puppeteer
const browser = await puppeteer.launch({});
// load new page
const page = await browser.newPage();
// go to URL and wait for initial requests to resolve
await page.goto(pageUrl, {
waitUntil: "networkidle0"
});
// enable this here because we don't want to watch the initial page asset requests (which page.goto above triggers) 
await page.setRequestInterception(true);
// custom version of pending-xhr-puppeteer module
let monitorRequests = new PuppeteerNetworkMonitor(page);
// START LOOP
for (let value of lotsOfValues) {
// interact with select menu
await page.select('select', value);
// wait for network requests to complete (images, fonts)
await monitorRequests.waitForAllRequests();
// screenshot page with new content
await pageElement.screenshot({
type: "jpeg",
quality: 100
});
} // END LOOP
// close
await browser.close();

NPM 模块

class PuppeteerNetworkMonitor {
constructor(page) {
this.promisees = [];
this.page = page;
this.resourceType = ['image'];
this.pendingRequests = new Set();
this.finishedRequestsWithSuccess = new Set();
this.finishedRequestsWithErrors = new Set();
page.on('request', (request) => {
request.continue();
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.add(request);
this.promisees.push(
new Promise(resolve => {
request.resolver = resolve;
}),
);
}
});
page.on('requestfailed', (request) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithErrors.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
});
page.on('requestfinished', (request) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithSuccess.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
});
}
async waitForAllRequests() {
if (this.pendingRequestCount() === 0) {
return;
}
await Promise.all(this.promisees);
}
pendingRequestCount() {
return this.pendingRequests.size;
}
}
module.exports = PuppeteerNetworkMonitor;

对于仍然对上面发布的解决方案@danlong但希望以更现代的方式使用它的人,这里是它的 TypeScript 版本:

import { HTTPRequest, Page, ResourceType } from "puppeteer";
export class PuppeteerNetworkMonitor {
page: Page;
resourceType: ResourceType[] = [];
promises: Promise<unknown>[] = [];
pendingRequests = new Set();
finishedRequestsWithSuccess = new Set();
finishedRequestsWithErrors = new Set();
constructor(page: Page, resourceType: ResourceType[]) {
this.page = page;
this.resourceType = resourceType;
this.finishedRequestsWithSuccess = new Set();
this.finishedRequestsWithErrors = new Set();
page.on(
"request",
async (
request: HTTPRequest & { resolver?: (value?: unknown) => void },
) => {
await request.continue();
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.add(request);
this.promises.push(
new Promise((resolve) => {
request.resolver = resolve;
}),
);
}
},
);
page.on(
"requestfailed",
(request: HTTPRequest & { resolver?: (value?: unknown) => void }) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithErrors.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
},
);
page.on(
"requestfinished",
(request: HTTPRequest & { resolver?: (value?: unknown) => void }) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithSuccess.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
},
);
}
async waitForAllRequests() {
if (this.pendingRequestCount() === 0) {
return;
}
await Promise.all(this.promises);
}
pendingRequestCount() {
return this.pendingRequests.size;
}
}

我确实改变了一件事,而不是硬编码要在网络请求中查找的资源类型,而是将要查找的资源类型作为构造函数参数之一传递。这应该使这个类更通用。

我已经使用使用Puppeteer的API测试了此代码,并且效果很好。 对于此类的用法,它将类似于@danlong上面发布的内容,如下所示:

// other necessary puppeteer code here...
const monitorNetworkRequests = new PuppeteerNetworkMonitor(page, ["image"]);
await monitorNetworkRequests.waitForAllRequests();

最新更新