Axios & Cheerio - 访问 JSON 文件中包含的 URL 列表,并将提取的数据添加到该对象中?



我正试图用Axios&Cheerio。

我需要分两部分来做。我已经完成了第一部分,它访问了一个网站,并为我写了一个JSON文件,其中包含一堆这样的对象:

[
{
"count": 0,
"title": "Result One",
"url": "http://www.resultone.com"
},
{
"count": 1,
"title": "Result Two",
"url": "http://www.resulttwo.com"
},
{
"count": 2,
"title": "Result Three",
"url": "http://www.resultone.com"
},
]

现在,对于第二部分,我需要阅读这个JSON文件,访问列出的每个URL,从页面中提取一些数据,并将其添加到原始文件中的当前JSON对象中。

创建JSON文件后,我可以运行以下操作:

let json_url_list = require('./' + outputFile);
// Loop over the URLS
for(i=0; i<json_url_list.length; ++i) {
let url = json_url_list[i].url;        

// Run a function here to visit the URL and extract data
getNewData(url)
}

还有一个类似的功能:

// Create new function to visit each of the URLs captured.
const getNewData = async(url) => {
try {
const response = await axios.get(url)
const $ = cheerio.load(response.data);

// Get the data here (using page title for example)
const title = $('title').text();

// TODO: Add the new data above to the original JSON object in the file we're reading from

return false;
} catch (error) {
console.error(error)
}
}

但是,这就是我对如何使这项工作失去想法的地方。。。有人能给我指正确的方向吗?

谢谢!

试试这个。

https://stackblitz.com/edit/js-tmmnqj?file=index.js

// Create new function to visit each of the URLs captured.
const getNewData = async (url) => {
try {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
// Get the data here (using page title for example)
const title = $('title').text();
console.log(title);
// TODO: Add the new data above to the original JSON object in the file we're reading from
return false;
} catch (error) {
console.error(error);
}
};

(async () => {
// Loop over the URLS
for (const jsonData of json_url_list) {
let url = jsonData.url;
// Run a function here to visit the URL and extract data
const data = await getNewData(url);
}
})();

最新更新