<a> 使用机械化和野木从标签中提取 href 参数



我有这个HTML:

<div id="main">
    <li>
        <h2>
            <a href="https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427">S.J.Res.427</a>
        </h2>
    </li>
    <li>
        ....
    </li>
</div>

我想提取<a>标签的href值。

使用Mechanize和Nokogiri我这样做了:

activity_list = member.search('#main li')
activity_list.each do |link| 
    activity_link = link.at("h2 a[href]")
end

但是我得到了TypeError: no implicit conversion of nil into String

怎么了?

您正在寻找#attr方法:

html = Nokogiri::HTML('<div id="main"><li><h2>
  <a href="https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427">S.J.Res.427</a>
</h2></li></div>')
html.search('#main li').each do |link|
  #                         ⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓
  puts link.at("h2 a[href]").attr('href')
end
#⇒ https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427

我会这样写:

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
    <div id="main">
      <li>
        <h2>
          <a href="foo">S.J.Res.427</a>
        </h2>
      </li>
      <li>
        <h2>
          <a href="bar">S.J.Res.427</a>
        </h2>
      </li>
    </div>
EOT
activity_list = doc.search('#main li')
activity_list.each do |link| 
  activity_link = link.at("h2 a[href]") 
  activity_link['href'] # => "foo", "bar"
end

当您指向一个节点时,您可以使用[]访问参数的值。

相关内容

  • 没有找到相关文章

最新更新