与Nokogiri::XML::Text#文本输出混淆



我写的代码如下:

require 'nokogiri'
require 'pp'
html = <<-END
<html>
    <head>
    <title> A Dirge </title>
    <link rel     = "schema.DC"
          href    = "http://purl.org/DC/elements/1.0/">
    <meta name    = "DC.Title"
          content = "A Dirge">
    <meta name    = "DC.Creator"
          content = "Shelley, Percy Bysshe">
    <meta name    = "DC.Type"
          content = "poem">
    <meta name    = "DC.Date"
          content = "1820">
    <meta name    = "DC.Format"
          content = "text/html">
    <meta name    = "DC.Language"
          content = "en">
    </head>
    <body><pre>
            Rough wind, that moanest loud
              Grief too sad for song;
            Wild wind, when sullen cloud
              Knells all the night long;
            Sad storm, whose tears are vain,
            Bare woods, whose branches strain,
            Deep caves and dreary main, -
              Wail, for the world's wrong!
    </pre></body>
    </html>
 END
doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc 
doc.children.each do |ch|
    p ch.text if ch.text?
end

但是它输出:

"nn    nn    "
"nn    "

现在我的问题是为什么行<pre> ..<pre>没有打印?

谁能帮我解决这个问题?

doc.children.each块输出比我多一点:

<>之前"nn nn" n n" n n" n n" n n" n n" n n" n n"nn nn" n n n"之前

这是正确的输出;这些是<html>的直接子文本节点。

我不确定你想要哪条"线",你没有看到。例如,如果您想要<pre>的内容,可以执行

doc.xpath("pre").text

得到它。如果这还不能回答你的问题,你就得澄清你的问题了。

相关内容

  • 没有找到相关文章

最新更新