我有一个CSV,我喜欢在上面保存我所有的哈希值。我正在使用nokogiri sax来解析xml文档,然后将其保存到CSV。
萨克斯解析器:
require 'rubygems'
require 'nokogiri'
require 'csv'
class MyDocument < Nokogiri::XML::SAX::Document
HEADERS = [ :titles, :identifier, :typeOfLevel, :typeOfResponsibleBody,
:type, :exact, :degree, :academic, :code, :text ]
def initialize
@infodata = {}
@infodata[:titles] = Array.new([])
end
def start_element(name, attrs)
@attrs = attrs
@content = ''
end
def end_element(name)
if name == 'title'
Hash[@attrs]["xml:lang"]
@infodata[:titles] << @content
@content = nil
end
if name == 'identifier'
@infodata[:identifier] = @content
@content = nil
end
if name == 'typeOfLevel'
@infodata[:typeOfLevel] = @content
@content = nil
end
if name == 'typeOfResponsibleBody'
@infodata[:typeOfResponsibleBody] = @content
@content = nil
end
if name == 'type'
@infodata[:type] = @content
@content = nil
end
if name == 'exact'
@infodata[:exact] = @content
@content = nil
end
if name == 'degree'
@infodata[:degree] = @content
@content = nil
end
if name == 'academic'
@infodata[:academic] = @content
@content = nil
end
if name == 'code'
Hash[@attrs]['source="vhs"']
@infodata[:code] = @content
@content = nil
end
if name == 'ct:text'
@infodata[:beskrivning] = @content
@content = nil
end
end
def characters(string)
@content << string if @content
end
def cdata_block(string)
characters(string)
end
def end_document
File.open("infodata.csv", "ab") do |f|
csv = CSV.generate_line(HEADERS.map {|h| @infodata[h] })
csv << "n"
f.write(csv)
end
end
end
为存储在文件夹中的每个文件(47.000xml 文件)创建新的对象:
parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
counter = 0
Dir.glob('/Users/macbookpro/Desktop/sax/info_xml/*.xml') do |item|
parser.parse(File.open(item, 'rb'))
counter += 1
puts "Writing file nr: #{counter}"
end
问题:我不会为每组新值获得新行。有什么想法吗?
3 个用于尝试代码的 XML 文件:https://gist.github.com/2378898https://gist.github.com/2378901https://gist.github.com/2378904
您需要使用"a"模式打开文件(使用"w"打开文件会清除任何以前的内容)。
将数组附加到 csv 对象将自动插入换行符。 哈希#值返回值的数组,但强制排序会更安全。 扁平化数组可能会导致列未对齐(例如 [[:title1, :title2], 'other-value'] 将导致 [:title1, :title2, 'other-value'])。 尝试这样的事情:
HEADERS = [:titles, :identifier, ...]
def end_document
# with ruby 1.8.7
File.open("infodata.csv", "ab") do |f|
csv = CSV.generate_line(HEADERS.map { |h| @infodata[h] })
csv << "n"
f.write(csv)
end
# with ruby 1.9.x
CSV.open("infodata.csv", "ab") do |csv|
csv << HEADERS.map { |h| @infodata[h] }
end
end
可以通过执行以下操作来验证上述更改:
require "csv"
class CsvAppender
HEADERS = [ :titles, :identifier, :typeOfLevel, :typeOfResponsibleBody, :type,
:exact, :degree, :academic, :code, :text ]
def initialize
@infodata = { :titles => ["t1", "t2"], :identifier => 0 }
end
def end_document
@infodata[:identifier] += 1
# with ruby 1.8.7
File.open("infodata.csv", "ab") do |f|
csv = CSV.generate_line(HEADERS.map { |h| @infodata[h] })
csv << "n"
f.write(csv)
end
# with ruby 1.9.x
#CSV.open("infodata.csv", "ab") do |csv|
# csv << HEADERS.map { |h| @infodata[h] }
#end
end
end
appender = CsvAppender.new
3.times do
appender.end_document
end
File.read("infodata.csv").split("n").each do |line|
puts line
end
运行上述信息数据后.csv文件将包含:
"[""t1"", ""t2""]",1,,,,,,,,
"[""t1"", ""t2""]",2,,,,,,,,
"[""t1"", ""t2""]",3,,,,,,,,
我想你需要一个额外的循环。 类似于
CSV.open("infodata.csv", "wb") do |csv|
csv << @infodata.keys
@infodata.each do |key, value|
csv << value
end
end