我正在尝试获取节点的子代:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
nom_id = @doc.xpath('//race/nomination/@id')
race_id.each do |x|
puts race_id.traverse {|race_id| puts nom_id }
end
我正在查看两个信息来源:
具有的
XML:Node
的文档Nokogiri::XML::Node#children
sparkemotion的备忘单:
node.traverse {|node| } # yields all children and self to a block, _recursivel
这是我的测试XML:
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
我可以使用XPath轻松获得竞赛id
:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
race_id = @doc.xpath('//race/@id')
nom_id = @doc.xpath('//race/nomination/@id')
...
215411
215412
如何获取仅race_id
215411的节点提名id和编号,并将其存储到哈希中(如下所示(?
{215411 => [{id:198926, number:8},{id:198965, number:2}]}
require 'nokogiri'
# xml data
str =<<-EOS
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOS
# create doc
doc = Nokogiri::XML(str)
# clean; via http://stackoverflow.com/a/1528247
doc.xpath('//text()[not(normalize-space())]').remove
# parse doc
parsed_doc = doc.xpath('//race').inject({}) {|h,x| h[x.get_attribute('id').to_i] = x.children.map {|y| {id: y.get_attribute('id').to_i, number: y.get_attribute('number').to_i}}; h}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}],
# 215412=>
# [{:id=>199634, :number=>1},
# {:id=>208926, :number=>2},
# {:id=>122923, :number=>3}]}
# select via id
parsed_doc.select {|k,v| k == 215411}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}]}
这是一个多线性的线性:
parsed_doc = doc.xpath('//race').inject({}) do |h,x|
h[x.get_attribute('id').to_i] = x.children.map do |y|
{
id: y.get_attribute('id').to_i,
number: y.get_attribute('number').to_i
}
end
h
end
我会做一些类似的事情:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOT
race_id = 215411
nominations = doc.at("race[id='#{race_id}']")
.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race_id => nominations}
# => {215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}
race[id='#{race_id}']
正在构建一个CSS选择器来查找所需的节点。然后很容易找到所需的nomination
节点。
注意,我不使用children
或traverse
,因为它们将返回所有节点,包括文本节点,而不仅仅是元素节点。我必须使用额外的逻辑来忽略文本节点,这将浪费时间和空间。
你的问题还不清楚,但如果你想返回所有比赛的信息,这是一个简单的调整:
doc.search('race').map{ |race|
nominations = race.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race['id'].to_i => nominations}
}
# => [{215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}, {215412=>[{:number=>1, :id=>199634}, {:number=>2, :id=>208926}, {:number=>3, :id=>122923}]}]