我写的代码如下:
require 'nokogiri'
require 'pp'
html = <<-END
<html>
<head>
<title> A Dirge </title>
<link rel = "schema.DC"
href = "http://purl.org/DC/elements/1.0/">
<meta name = "DC.Title"
content = "A Dirge">
<meta name = "DC.Creator"
content = "Shelley, Percy Bysshe">
<meta name = "DC.Type"
content = "poem">
<meta name = "DC.Date"
content = "1820">
<meta name = "DC.Format"
content = "text/html">
<meta name = "DC.Language"
content = "en">
</head>
<body><pre>
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
</pre></body>
</html>
END
doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc
doc.children.each do |ch|
p ch.text if ch.text?
end
但是它输出:
"nn nn "
"nn "
现在我的问题是为什么行<pre>
..<pre>
没有打印?
doc.children.each
块输出比我多一点:
这是正确的输出;这些是<html>
的直接子文本节点。
我不确定你想要哪条"线",你没有看到。例如,如果您想要<pre>
的内容,可以执行
doc.xpath("pre").text
得到它。如果这还不能回答你的问题,你就得澄清你的问题了。