所以我正在循环一个数组元素,这是返回的结果:
[nil, [#<Nokogiri::XML::Element:0x835386d4 name="a" attributes=[#<Nokogiri::XML::Attr:0x835385f8 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x835381c0 "Web Designer Full time">]>
我想做的是访问href
值,然后访问text
值。我该怎么做?
我试过这个:
puts i[:href]
但这会产生此错误:
TypeError: Symbol as array index
顺便说一下,我正在通过每个元素访问i
作为数组中的一个元素,如下所示:
contents.each do |i|
puts i.inspect
puts i[:href]
end
编辑 1:
这就是我生成contents
数组的方式。没有必要重命名它,因为它可能会混淆:)
contents = {}
first_items.each do |link|
content_url = link
content_page = Nokogiri::HTML(open(content_url))
contents[link[:href]] = content_page.css("p a")
end
puts contents.inspect
这是获得输出的内容:
{nil=>[#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]}
以下是i
输出的完整值:
--------------------
This is the value of i:
[nil, [#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]]
--------------------
This is the value of i.href:
编辑 2:
顺便说一下,这就是实际的 HTML 输出的样子......我这样做了:
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.body {
contents.each do |el|
if !el.nil?
puts "-" * 20
puts "This is the value of el: "
puts el.inspect
puts "-" * 20
puts "This is the value of el.href: "
puts el[:href]
end
doc.p {
doc.a el, :href => el
}
end
}
}
end
puts "*" * 50
puts "This is the HTML generated"
puts builder.to_html
这是它的外观:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p><a href="<a%20href=%22http://bham.craigslist.org/web/2961573018.html%22>Web%20Designer%20Full%20time</a><a%20href=%22http://bham.craigslist.org/web/2959813303.html%22>Once%20in%20a%20lifetime%20opportunity...</a><a%20href=%22http://bham.craigslist.org/web/2925485723.html%22>Website%20Designer%20and%20Blogging%20Internship!</a><a%20href=%22http://bham.craigslist.org/web/2918424652.html%22>Excellent%20Java%20Developer%20Opportunity!</a><a%20href=%22http://bham.craigslist.org/web/2888669703.html%22>Freelance%20Graphic%20Design</a><a%20href=%22http://bham.craigslist.org/web/2900256461.html%22>GWT/GXT%20Developer</a><a%20href=%22http://bham.craigslist.org/web/2897641463.html%22>Website%20hiring!</a>"><a href="http://bham.craigslist.org/web/2961573018.html">Web Designer Full time</a><a href="http://bham.craigslist.org/web/2959813303.html">Once in a lifetime opportunity...</a><a href="http://bham.craigslist.org/web/2925485723.html">Website Designer and Blogging Internship!</a><a href="http://bham.craigslist.org/web/2918424652.html">Excellent Java Developer Opportunity!</a><a href="http://bham.craigslist.org/web/2888669703.html">Freelance Graphic Design</a><a href="http://bham.craigslist.org/web/2900256461.html">GWT/GXT Developer</a><a href="http://bham.craigslist.org/web/2897641463.html">Website hiring!</a></a></p></body></html>
我认为它可以简单得多。Nokogiri 已经解析了文档,并提供了访问内容的便捷方法。与其循环存储Nokogiri对象,然后尝试提取它们,为什么不尝试更直接的方法呢?
试试这个代码:
content_page.search(//a[@href]).map{ |el| [el[:href], el.text] }
这将创建包含文档中每个链接的文本和 href 的 2d 数组,这是您在实际努力的后续注释中所说的内容。
您可以使用 compact 来删除 nils:
nodes.compact.each do |node|
puts node[:href], node.text
end
也许是这样,因为你的数组中有一个奇怪的零。
contents.each do |i|
if !i.nil?
puts i.inspect
puts i[:href]
end
end
编辑1:实际上我认为你只需要做contents = contents[1]
。
contents = contents[1]
contents.each do |i|
puts i.inspect
puts i[:href]
end