使用crack gem将简化的XML文件导入mongodb



在这里,我使用mongodb驱动程序的ruby。但在此之后,这将工作完美,我想运行它作为一个计划任务在Ruby on Rails 3与Mongoid ODB。

所以现在,我正在用ruby做实验。

我注意到,当涉及到将XML文件转换为可以插入mongodb的格式时,crack gem非常方便。当我为ruby使用mongodb驱动程序时,crack转换为接近JSON的格式(它使用"=>"而不是":"列),这是在我将其插入mondodb数据库之前所需的条件,如图所示。

问题的方式,我使用裂缝下面它导入的一切是在XML文件。

sample.xml

<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Envelope>
    <TransmissionDateTime>05/08/2013 23:06:02</TransmissionDateTime>
  </Envelope>
  <Message>
    <Comment />
    <Header>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
      <DeliveryId>6377935</DeliveryId>
      <ShipToAddress1>123-4567</ShipToAddress1>
      <OrderDate>05/08/13</OrderDate>
      <Payments>
        <PayType>Credit Card</PayType>
        <Amount>1000</Amount>
      </Payments>
      <Payments>
        <PayType>Points</PayType>
        <Amount>5390</Amount>
      </Payments>
    </Header>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
      <Quantity>1</Quantity>
      <UnitCost>1610</UnitCost>
      <DeclaredValue>0</DeclaredValue>
      <PointValue>13</PointValue>
    </Line>
    <Line>
      <LineNumber>8.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Fruit Set</Description>
      <Quantity>1</Quantity>
      <UnitCost>23550</UnitCost>
      <PointValue>105</PointValue>
      <PickLine>
        <PickLineNumber>8.1..1</PickLineNumber>
        <PickItem>fruit-002</PickItem>
        <PickDescription>Apple</PickDescription>
        <PickQuantity>1</PickQuantity>
      </PickLine>
      <PickLine>
        <PickLineNumber>8.1..2</PickLineNumber>
        <PickItem>fruit-003</PickItem>
        <PickDescription>Orange</PickDescription>
        <PickQuantity>2</PickQuantity>
      </PickLine>
    </Line>
  </Message>
</ShipmentRequest>

sample_crack.rb

#!/usr/bin/ruby
require "crack"
require 'mongo'
include Mongo
mongo_client = MongoClient.new("localhost", 27017)
db = mongo_client.db("somedb")
coll = db.collection("somecoll")
myXML  = Crack::XML.parse(File.read("sample.xml"))
coll.insert(myXML)
puts myXML

在控制台上打印:

{"ShipmentRequest"=>{"Envelope"=>{"TransmissionDateTime"=>"05/08/2013 23:06:02"}, "Message"=>{"Comment"=>nil, "Header"=>{"MemberId"=>"A00000001", "MemberName"=>"Bruce", "DeliveryId"=>"6377935", "ShipToAddress1"=>"123-4567", "OrderDate"=>"05/08/13", "Payments"=>[{"PayType"=>"Credit Card", "Amount"=>"1000"}, {"PayType"=>"Points", "Amount"=>"5390"}]}, "Line"=>[{"LineNumber"=>"3.1", "Item"=>"fruit-004", "Description"=>"Peach", "Quantity"=>"1", "UnitCost"=>"1610", "DeclaredValue"=>"0", "PointValue"=>"13"}, {"LineNumber"=>"8.1", "Item"=>"fruit-001", "Description"=>"Fruit Set", "Quantity"=>"1", "UnitCost"=>"23550", "PointValue"=>"105", "PickLine"=>[{"PickLineNumber"=>"8.1..1", "PickItem"=>"fruit-002", "PickDescription"=>"Apple", "PickQuantity"=>"1"}, {"PickLineNumber"=>"8.1..2", "PickItem"=>"fruit-003", "PickDescription"=>"Orange", "PickQuantity"=>"2"}]}]}}, :_id=>BSON::ObjectId('51ad8d83a3d24b3b9f000001')}

在mongodb中,转换后的XML文件如下:

{
    "_id" : ObjectId("51ad8d83a3d24b3b9f000001"),
    "ShipmentRequest" : {
        "Envelope" : {
            "TransmissionDateTime" : "05/08/2013 23:06:02"
        },
        "Message" : {
            "Comment" : null,
            "Header" : {
                "MemberId" : "A00000001",
                "MemberName" : "Bruce",
                "DeliveryId" : "6377935",
                "ShipToAddress1" : "123-4567",
                "OrderDate" : "05/08/13",
                "Payments" : [
                    {
                        "PayType" : "Credit Card",
                        "Amount" : "1000"
                    },
                    {
                        "PayType" : "Points",
                        "Amount" : "5390"
                    }
                ]
            },
            "Line" : [
                {
                    "LineNumber" : "3.1",
                    "Item" : "fruit-004",
                    "Description" : "Peach",
                    "Quantity" : "1",
                    "UnitCost" : "1610",
                    "DeclaredValue" : "0",
                    "PointValue" : "13"
                },
                {
                    "LineNumber" : "8.1",
                    "Item" : "fruit-001",
                    "Description" : "Fruit Set",
                    "Quantity" : "1",
                    "UnitCost" : "23550",
                    "PointValue" : "105",
                    "PickLine" : [
                        {
                            "PickLineNumber" : "8.1..1",
                            "PickItem" : "fruit-002",
                            "PickDescription" : "Apple",
                            "PickQuantity" : "1"
                        },
                        {
                            "PickLineNumber" : "8.1..2",
                            "PickItem" : "fruit-003",
                            "PickDescription" : "Orange",
                            "PickQuantity" : "2"
                        }
                    ]
                }
            ]
        }
    }
}

但是我想导入它,比如消除不需要的节点并忽略空节点:

{
    "_id" : ObjectId("51ad8d83a3d24b3b9f000001"),
    "MemberId" : "A00000001",
    "MemberName" : "Bruce",
    "DeliveryId" : "6377935",
    "ShipToAddress1" : "123-4567",
    "OrderDate" : "05/08/13",
    "Payments" : [
    {
        "PayType" : "Credit Card",
        "Amount" : "1000"
    },
    {
        "PayType" : "Points",
        "Amount" : "5390"
    }
    ],
    "Line" : [
    {
        "LineNumber" : "3.1",
        "Item" : "fruit-004",
        "Description" : "Peach",
        "Quantity" : "1",
        "UnitCost" : "1610",
        "DeclaredValue" : "0",
        "PointValue" : "13"
    },
    {
        "LineNumber" : "8.1",
        "Item" : "fruit-001",
        "Description" : "Fruit Set",
        "Quantity" : "1",
        "UnitCost" : "23550",
        "PointValue" : "105",
        "PickLine" : [
        {
            "PickLineNumber" : "8.1..1",
            "PickItem" : "fruit-002",
            "PickDescription" : "Apple",
            "PickQuantity" : "1"
        },
        {
            "PickLineNumber" : "8.1..2",
            "PickItem" : "fruit-003",
            "PickDescription" : "Orange",
            "PickQuantity" : "2"
        }
        ]
    }
    ]
}

这可以用裂缝来完成吗?或者这可以用nokoogiri更好地完成?


非常感谢@Alex Peachey,在这里我放了更新的代码。

sample_crack/rb(更新):

#!/usr/bin/ruby
require "crack"
require 'mongo'
include Mongo
mongo_client = MongoClient.new("localhost", 27017)
db = mongo_client.db("somedb")
coll = db.collection("somecoll")
myXML  = Crack::XML.parse(File.read("sample.xml"))
myXML.merge!(myXML.delete("ShipmentRequest")) # not needed hash
myXML.merge!(myXML.delete("Message"))         # not needed hash
myXML.merge!(myXML.delete("Header"))          # not needed hash
myXML.delete("Envelope")                      # not needed hash
# planning to put here a code to remove hashes with empty values
coll.insert(myXML)
puts myXML

很难说如何定义"不需要的"节点,但空节点很容易理解。无论哪种方式,Crack都非常擅长它为您做的事情,它基本上是将XML转换为哈希。一旦你有了哈希,在你把它插入Mongo之前,根据你的规则对它进行修剪。

根据你的评论,我更好地理解了你的问题。我的答案仍然成立,只要操作哈希。具体来说,你可以这样做:
myXML.merge!(myXML.delete("ShipmentRequest"))
myXML.delete("Envelope")
myXML.merge!(myXML.delete("Message"))

最新更新