如何对使用Nokogiri解析的整个HTML文档进行下切

我需要用Nokogiri解析HTML文档中的所有文本。这是我的代码：

agent = Mechanize.new
page = agent.get('http://www.example.com').parser.search('//*[translate(text(),"ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") = *]').to_html

代码中没有错误；它执行时没有出现错误。然而，如果我进去检查文档中的一个随机标签，情况仍然和以前一样。是否有其他/更好的方法可以将文档中的所有文本向下转换？

您可以使用traverse来向下转换所有文本节点：

require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.example.com/"))
doc.traverse do |node|
  node.content = node.content.downcase if node.text?
end
puts doc.to_html

输出：

<!DOCTYPE html>
<html>
<head>
    <title>example domain</title>
    <meta charset="utf-8">
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <style type="text/css">
    body { ... }
    </style>
</head>
<body>
<div>
    <h1>example domain</h1>
    <p>this domain is established to be used for illustrative examples in documents. you may use this
    domain in examples without prior coordination or asking for permission.</p>
    <p><a href="http://www.iana.org/domains/example">more information...</a></p>
</div>
</body>
</html>

相关内容

最新更新

热门标签：