用cheerio获取页面标题

我正试图用cheerio获取url的标题标签。但是，我得到了空字符串值。这是我的代码：

app.get('/scrape', function(req, res){
    url = 'http://nrabinowitz.github.io/pjscrape/';
    request(url, function(error, response, html){
        if(!error){
                        var $ = cheerio.load(html);
            var title, release, rating;
            var json = { title : "", release : "", rating : ""};
            $('title').filter(function(){
                //var data = $(this);
                var data = $(this);
                        title = data.children().first().text();            
                        release = data.children().last().children().text();
                json.title = title;
                json.release = release;
            })
            $('.star-box-giga-star').filter(function(){
                var data = $(this);
                rating = data.text();
                json.rating = rating;
            })
        }

        fs.writeFile('output.json', JSON.stringify(json, null, 4), function(err){
            console.log('File successfully written! - Check your project directory for the output.json file');
        })
        // Finally, we'll just send out a message to the browser reminding you that this app does not have a UI.
        res.send('Check your console!')
    })
});

request(url, function (error, response, body) 
{
  if (!error && response.statusCode == 200) 
  {
    var $ = cheerio.load(body);
    var title = $("title").text();
  }
})

使用Javascript，我们提取"title"标记中包含的文本。

如果Robert Ryan的解决方案仍然不起作用，我会怀疑原始页面的格式，它可能会以某种方式格式错误。

在我的例子中，我接受了gzip和其他压缩，但从未解码，所以Cheerio试图解析压缩的二进制位。当控制台记录原始正文时，我能够发现二进制文本而不是纯文本HTML。

相关内容

最新更新

热门标签：