读取复杂字符串的完整CSV单元格数据



我正在使用ruby合并可能包含不同标头的CSV文件
我的问题是CSV文件中的一些值非常复杂,并且当数据在合并过程中丢失时

例如,原始值:
"[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")"
将被写为"[cell([""A"",v1,""B""]),
,因此我在尝试读取合并文件时得到CSV::MalformedCSVError (CSV::MalformedCSVError)

如何读取和写入每个CSV单元格的确切内容

我的代码和运行示例:

def join_multiple_csv(csv_path_array)
f = CSV.parse(File.read(csv_path_array[0]), :headers => true, :quote_char => "'")
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true, :quote_char => "'")  
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_headers = f_h.keys.map {|string| string}
output = csv_headers.join(",") + "n"
(0..n_rows-1).each do |i|
row = ''
f_h.each_key do |header|
if f_h[header][i].nil?
row.concat(f_h[header][i].to_s + ",")
else
row.concat(f_h[header][i].to_s + ",")
end
end
output.concat(row + "n")
end
return output
end
csv_files = ['f1.csv', 'f2.csv']
outputs = join_multiple_csv(csv_files)
f = CSV.new(outputs)
row = f.readline
while row do
row = f.readline
end

运行示例:
f1.csv

H1,H3,H4
v1,v2,v3

f2.csv

H2,H3,H4
v1,v3,"[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")"

预期输出:

H1,H2,H3,H4
v1,,v2,v3
,v1,v3,"[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")"

输出:

H1,H3,H4,H2,
v1,v2,v3,,,
,v3,"[cell([""A"",v1,""B""]),
,,,,,
,,,,,

知道我能做什么吗?

对不起,我回答得很匆忙。

我试着运行你的程序,发现引号字符导致字符串中每个逗号上的单元格值被拆分。将引号字符更改为双引号对我有效

f = CSV.parse(File.read(csv_path_array[0]), :headers => true, :quote_char => '"')

curr_csv = CSV.parse(File.read(csv_file), :headers => true, :quote_char => '"')  

最新更新