如何在webcrapping中使用nodejs获取数据



我试图抓取这个链接,但我从主选项卡中获得了数据,我不知道如何解决这个问题。我用python、nodejs尝试了这个,但两者都有相同的结果。。。https://www.gate.io/marketlist?tab=loan我尝试了此链接并从中获取数据https://www.gate.io/marketlist?tab=usdt如果有人在这件事上帮助我,我将不胜感激。这是我写的代码。

const PORT = 8000
const axios = require('axios')
const cheerio = require('cheerio')
const { response } = require('express')
const express = require('express')
const app = express ()
const url = 'https://www.gate.io/marketlist?tab=loan'
axios(url)
.then(response => {
const html = response.data 
console.log(html)
})
app.listen(PORT , () => console.log('server running on PORT ${PORT}'))

我想你所要做的就是用aync wait包装axios,然后你就会得到你想要的结果,

const PORT = 8000
const axios = require('axios')
const cheerio = require('cheerio')
const { response } = require('express')
const express = require('express')
const app = express ()
const url = 'https://www.gate.io/marketlist?tab=loan'
const sendGetRequest = async () => {
try {
const response = await axios.get(url);
console.log(response.data);
} catch (err) {
console.error(err);
}
};

sendGetRequest();
app.listen(PORT , () => console.log('server running on PORT ${PORT}'))

您需要的页面是使用javascript呈现的,所以cheerio对您没有帮助。相反,你需要使用类似木偶的东西。

这里有一个小例子:

const puppeteer = require("puppeteer");
const URL = "https://www.gate.io/marketlist?tab=loan";
async function scrape() {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto(URL);
await page.waitForSelector("your_selector"); //or await page.waitForTimeout("your_timeout");
// here you do what you need
browser.close();
}
scrape();

有关更多用法信息,请参阅Puppeteer文档。

最新更新