如何使用元数据关键字名称解析XML文件



我最近开始使用Nokogiri作为将数据解析到RAILS 3应用程序中的解决方案。我遇到的问题是,我不完全理解如何做到这一点,因为我正在解析的XML似乎是"非标准的"。看看下面的片段:

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
    xmlns="http://.com/schemas/xmldata/1/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
    xs:schemaLocation="http://.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
    <metadata>
          <item name="Problem ID" type="xs:string" length="32"/>
          <item name="Account Title" type="xs:string" length="162"/>
          <item name="Account Name" type="xs:string" length="162"/>
          <item name="Reassignment" type="xs:int" precision="1"/>
          <item name="Initial Severity" type="xs:int" precision="1"/>
          <item name="Resolution Desc" type="xs:string" length="510"/>
          <item name="Resolver Name" type="xs:string" length="82"/>
          <item name="Problem Code" type="xs:string" length="32"/>
          <item name="Status" type="xs:string" length="32"/>
    </metadata>
    <data>
        <row>
            <value>AP-06684768    </value>
            <value>ESA</value>
            <value>1</value>
            <value>8</value>
            <value>8</value>
            <value xs:nil="true" />
            <value xs:nil="true" />
            <value>ADDITION TO EXISTING FIREWALL</value>
            <value></value>
            <value>ESA BRIDGE                              </value>
            <value>CLOSED         </value>
            <value>CLOSED         </value>
        </row>
        <row>
            <value>AP-06720564    </value>
            <value>ESA</value>
            <value>2011-01-19T12:02:47</value>
            <value>2011-01-19T12:02:49</value>
            <value>0</value>
            <value>776</value>
            <value>SCP UESCADADEV -&gt; UESCADAPW/BW</value>
            <value>NETAU_NETMGTS  </value>
            <value>N/A</value>
            <value>ESA BRIDGE                              </value>
            <value>CLOSED         </value>
            <value>CLOSED         </value>
        </row>
    </data>
</dataset>

它似乎是一个"元数据"部分,然后是行,而不是命名的节点和属性,实际上很像一个表。我将如何解析所有这些数据?

require 'rubygems'
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(DATA)
column_names = doc.css('dataset > metadata > item').map {|a| a['name']}
result = doc.css('dataset > data > row').map do |row|
  values = row.css('value').map { |value| value[:nil] == 'true' ? nil : value.content }
  Hash[column_names.zip(values)]
end
pp result

中的结果

[{"Problem Code"=>"ADDITION TO EXISTING FIREWALL",
  "Resolution Desc"=>nil,
  "Reassignment"=>"8",
  "Resolver Name"=>nil,
  "Status"=>"",
  "Problem ID"=>"AP-06684768    ",
  "Account Name"=>"1",
  "Initial Severity"=>"8",
  "Account Title"=>"ESA"},
 {"Problem Code"=>"NETAU_NETMGTS  ",
  "Resolution Desc"=>"776",
  "Reassignment"=>"2011-01-19T12:02:49",
  "Resolver Name"=>"SCP UESCADADEV -> UESCADAPW/BW",
  "Status"=>"N/A",
  "Problem ID"=>"AP-06720564    ",
  "Account Name"=>"2011-01-19T12:02:47",
  "Initial Severity"=>"0",
  "Account Title"=>"ESA"}]

以下是我破解并测试的工作代码:

require 'rubygems'
require 'nokogiri'
class Item
  attr_accessor :name
  def initialize(name)
    @name = name
  end
end
file = File.open("data.xml")
document = Nokogiri::XML(file)
file.close
metadata = document.root.children[3]
items = metadata.children.reject{|child| child.attribute('name').nil?}.map do |child|
  Item.new(child.attribute('name').value)
end
puts "#{items.size} items"
puts items.inspect

结果:

[~/stackoverflow/graphML] ruby parse.rb
9 items
[#<Item:0x007fc01c0fbd90 @id="Problem ID">, #<Item:0x007fc01c0fbca0 @id="Account Title">, #<Item:0x007fc01c0fbc28 @id="Account Name">, #<Item:0x007fc01c0fbbb0 @id="Reassignment">, #<Item:0x007fc01c0fbb38 @id="Initial Severity">, #<Item:0x007fc01c0fbac0 @id="Resolution Desc">, #<Item:0x007fc01c0fba48 @id="Resolver Name">, #<Item:0x007fc01c0fb9d0 @id="Problem Code">, #<Item:0x007fc01c0fb868 @id="Status">]

以下是GitHub上的完整项目:https://github.com/endymion/GraphML-parsing-exercise/tree/metadata-key-names

(这是GraphML解析练习的一个分支,我今晚早些时候在Stack Overflow上为其他人破解了这个分支。)

最新更新