这是我用于抓取的特定代码:
require 'singleton'
require 'open-uri'
class ProgramHighlights < ActiveRecord::Base
self.table_name = 'program_highlights'
include ActiveRecord::Singleton
def fetch
url = "http://kboo.fm/"
doc = Nokogiri::HTML(open(url))
titles = []
program_title = doc.css(".title a").each do |title|
titles.push(title)
end
end
end
当访问标题数组并通过它进行每个时,我的输出是:
(Element:0x5b40910 {
name = "a",
attributes = [
#(Attr:0x5b8c310 {
name = "href",
value = "/content/thedeathsofothersthefateofciviliansinamericaswars"
}),
#(Attr:0x5b8c306 {
name = "title",
value = "The Deaths of Others: The Fate of Civilians in America's Wars"
})],
children = [
#(Text "The Deaths of Others: The Fate of Civilians in America's Wars")]
})
我特别想获得"价值"但是,执行以下操作不会拉动它:
titles[0].value
titles[0]["value"]
titles[0][value]
我不知道为什么我无法访问它,因为它似乎是一个哈希值。有什么方向的指示吗?我无法以简单的 JSON 格式获取数据,因此需要抓取。
要获取节点的属性值,可以使用 ['attribute_name']。例如:
require 'nokogiri'
html = %Q{
<html>
<a href="/content/thedeathsofothersthefateofciviliansinamericaswars" title="The Deaths of Others: The Fate of Civilians in America's Wars">
</html>
}
doc = Nokogiri::HTML(html)
node = doc.at_css('a')
puts node['href']
#=> /content/thedeathsofothersthefateofciviliansinamericaswars
puts node['title']
#=> The Deaths of Others: The Fate of Civilians in America's Wars
假设你想要每个链接的标题属性值,你可以这样做:
program_title = doc.css(".title a").each do |link|
titles.push(link['title'])
end