为Nokogiri和他们的文档设置一些问题,开始有点粗糙。
我正在尝试解析XML文件:http://www.kongregate.com/games_for_your_site.xml
返回游戏集内的多个游戏,每个游戏都有标题,名称等....
<gameset>
<game>
<id>160342</id>
<title>Tricky Rick</title>
<thumbnail>
http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op
</thumbnail>
<launch_date>2012-12-12</launch_date>
<category>Puzzle</category>
<flash_file>
http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf
</flash_file>
<width>640</width>
<height>480</height>
<url>
http://www.kongregate.com/games/tAMAS_Games/tricky-rick
</url>
<description>
Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!
</description>
<instructions>
WASD Arrow Keys – move; S Down Arrow – takerelease an object; CNTRL – interaction with objects: throw, hammer strike, invisibility mode; SPACE – interaction with elevators and fuel stations; Esc P – pause;
</instructions>
<developer_name>tAMAS_Games</developer_name>
<gameplays>24999</gameplays>
<rating>3.43</rating>
</game>
<game>
<id>160758</id>
<title>Flying Cookie Quest</title>
<thumbnail>
http://cdn2.kongregate.com/game_icons/0042/8428/icon_cookiequest_kong_250x200_site.png?16578-op
</thumbnail>
<launch_date>2012-12-07</launch_date>
<category>Action</category>
<flash_file>
http://external.kongregate-games.com/gamez/0016/0758/live/embeddable_160758.swf
</flash_file>
<width>640</width>
<height>480</height>
<url>
http://www.kongregate.com/games/LongAnimals/flying-cookie-quest
</url>
<description>
Launch Rocket Panda into the land of Cookies. With the help of low-flying sharks, hang-gliding sheep and Rocket Badger, can you defeat the all powerful Biscuit Head? Defeat All enemies of cookies in this launcher game.
</description>
<instructions>Use the mouse button!</instructions>
<developer_name>LongAnimals</developer_name>
<gameplays>168672</gameplays>
<rating>3.67</rating>
</game>
从文档中,我使用类似的东西:
require 'nokogiri'
require 'open-uri'
url = "http://www.kongregate.com/games_for_your_site.xml"
xml = Nokogiri::XML(open(url))
xml.xpath("//game").each do |node|
puts node.xpath("//id")
puts node.xpath("//title")
puts node.xpath("//thumbnail")
puts node.xpath("//category")
puts node.xpath("//flash_file")
puts node.xpath("//width")
puts node.xpath("//height")
puts node.xpath("//description")
puts node.xpath("//instructions")
end
但是,它只是返回无穷无尽的数据,而不是一个集合。
我将如何重写你的代码:
xml = Nokogiri::XML(open("http://www.kongregate.com/games_for_your_site.xml"))
xml.xpath("//game").each do |game|
%w[id title thumbnail category flash_file width height description instructions].each do |n|
puts game.at(n)
end
end
代码中的问题是所有子标记都以//
为前缀,在xpath中,这意味着"从根节点开始并向下搜索包含该文本的所有标记"。因此,它不是只在每个//game
节点内搜索,而是在整个文档中搜索每个//game
节点的每个列出的标签。
我建议使用CSS访问器而不是XPath,因为它们(通常)更简单,也更容易阅读。所以,我用search('game')
代替xpath('//game')
。(search
将接受CSS或XPath访问器,at
也是如此。)
如果您希望文本包含在标签中,将puts game.at(n)
更改为:
puts game.at(n).text
为了使输出更有用,我将这样做:
require 'nokogiri'
require 'open-uri'
xml = Nokogiri::XML(open('http://www.kongregate.com/games_for_your_site.xml'))
games = xml.search('game').map do |game|
%w[
id title thumbnail category flash_file width height description instructions
].each_with_object({}) do |n, o|
o[n] = game.at(n).text
end
end
require 'awesome_print'
puts games.size
ap games.first
ap games.last
结果是:
395
{
"id" => "160342",
"title" => "Tricky Rick",
"thumbnail" => "http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op",
"category" => "Puzzle",
"flash_file" => "http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf",
"width" => "640",
"height" => "480",
"description" => "Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!n",
"instructions" => "WASD \ Arrow Keys – move;nS \ Down Arrow – take\release an object;nCNTRL – interaction with objects: throw, hammer strike, invisibility mode;nSPACE – interaction with elevators and fuel stations;nEsc \ P – pause;n"
}
{
"id" => "78",
"title" => "rotaZion",
"thumbnail" => "http://cdn2.kongregate.com/game_icons/0000/0115/pixtiz.rotazion_icon.jpg?8217-op",
"category" => "Action",
"flash_file" => "http://external.kongregate-games.com/gamez/0000/0078/live/embeddable_78.swf",
"width" => "350",
"height" => "350",
"description" => "In rotaZion, you play with a bubble bar that you can’t stop rotating !nCollect the bubbles and try to avoid the mines !nCollect the different bonus to protect your bubble bar, makes the mines go slower or destroy all the mines !nTry to beat 100.000 points ;)n",
"instructions" => "Move the bubble bar with the arrow keys !nBubble = 500 Points !nPixtiz sign = 5000 Points !n"
}
您可以尝试这样做。我建议为你想要的游戏元素创建一个数组,然后迭代它们。我确信有一种方法可以获得Nokogiri中指定的所有元素,但这是有效的:
xml = Nokogiri::XML(result)
xml.css("game").each do |inv|
inv.css("title").each do |f| # title or whatever else you want
puts f.inner_html
end
end