我的问题与关于Python的问题类似，但与之不同的是，我的问题是关于Javascript的。

1.问题

我有一个大的纯文本网页URL列表(大约10k(
对于每个page@URL(或大多数(我需要找到一些元数据和标题
我想不加载整页，只在</head>关闭标记之前加载所有内容

2.问题

是否可以打开一个流，加载一些字节，然后在到达</head>时关闭流和连接？如果是，如何
Py的CCD_；尺寸"；参数的字节数，但JS的ReadableStreamDefaultReader.read()没有。那么我应该在JS中使用什么作为替代
这种方法会减少网络流量、带宽使用、CPU和内存使用吗

问题2的答案：

尝试使用节点提取的fetch(url, {size: 200})

https://github.com/node-fetch/node-fetch#fetchurl-选项

我不知道是否有一种方法可以从响应中只获取head元素，但您可以加载整个HTML文档，然后从中解析head，尽管与其他方法相比可能效率不高。我制作了一个基本的应用程序，使用axios和cheerio从一组url中获取head元素。我希望这能帮助到别人。

const axios = require("axios")
const cheerio = require("cheerio")
const URLs = ["https://stackoverflow.com/questions/73191546/get-only-html-head-from-url"] 
for (let i = 0; i < URLs.length; i++) {
axios.get(URLs[i])
.then(html => {
const document = html.data

// get the start index and the end index of the head
const startHead = document.indexOf("<head>")
const endHead = document.indexOf("</head>") + 7
//get the head as a string
const head = document.slice(startHead, endHead)

// load cheerio
const $ = cheerio.load(head)
// get the title from the head which is loaded into cheerio
console.log($("title").html())
})
.catch(e => console.log(e))
}

仅<head>从网址获取 HTML

1.问题

2.问题

问题2的答案：

相关内容

最新更新

热门标签：