使用Nokogiri获取HTML结构

我的任务是获取没有数据的文档的HTML结构。来自:

<html>
  <head>
    <title>Hello!</title>
  </head>
  <body id="uniq">
    <h1>Hello World!</h1>
  </body>
</html>

我想要得到:

<html>
  <head>
    <title></title>
  </head>
  <body id="uniq">
    <h1></h1>
  </body>
</html>

有很多方法可以用Nokogiri提取数据，但我找不到一种方法来执行相反的任务。

更新:找到的解决方案是我收到的两个答案的组合:

doc = Nokogiri::HTML(open("test.html"))
  doc.at_css("html").traverse do |node|
    if node.text?
      node.remove
    end
  end
    puts doc

输出正是我想要的

听起来您想要删除所有的文本节点。你可以这样做:

doc.xpath('//text()').remove
puts doc

遍历文档。对于每个节点，删除不需要的节点。然后写出文档

记住Nokogiri可以更改文档。医生

相关内容