我得到TypeError: $不是一个函数,我正试图使用nodejs中的puppeteer和cheerio从网站抓取数据
我试图建立一个刮板应用程序从网站收集数据
错误:
node:11668) UnhandledPromiseRejectionWarning: TypeError: $ is not a function
at checkPrice (file:///E:/vs%20work/scraper/index.js:26:5)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async monitor (file:///E:/vs%20work/scraper/index.js:44:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:11668) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:11668) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
代码:
import puppeteer from 'puppeteer';
import $ from 'cheerio';
import { CronJob } from 'cron';
import nodemailer from 'nodemailer';
const url = 'https://www.amazon.eg/-/en/Sony-Bluetooth-Cancellation-Headphone-Microphone/dp/B08F4XTS93/'
async function configureBrowser() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
return page;
}
async function checkPrice(page) {
await page.reload();
let html = await page.evaluate(() => document.body.innerHTML);
// console.log(html);
$('#priceblock_ourprice', html ).each(()=>{
let EGPPrice = $(this).text()
console.log(EGPPrice);
})
}
async function monitor(){
let page = await configureBrowser()
await checkPrice(page)
}
monitor()
首先,如果你正在使用Puppeteer,你可能不需要Cheerio。Puppeteer已经有了可以在活动站点上工作的选择器,所以使用Cheerio只是意味着您将已经解析的活动DOM的(可能)过时版本转储到浏览器的HTML解析器的副本中。这在99%的用例中是没有意义的。我在这里只使用await page.$eval("#priceblock_ourprice", el => el.textContent)
,跳过潜在的错误,额外的语法,我使用的库之间的混淆,速度慢和额外的依赖,只使用Puppeteer。
其次,如果必须转储Puppeteer的HTML,请使用await page.content()
而不是await page.evaluate(() => document.body.innerHTML);
。
其次,import $ from "cheerio"
不正确。Cheerio导出一个cheerio
对象,它用cheerio.load(html)
加载HTML。这是返回jQuery模拟器$
的函数。所以正确的代码流应该是:
import cheerio from "cheerio";
const html = "<p>test</p>";
const $ = cheerio.load(html);
console.log($("p").text());