将复杂文件拆分为散列

  • 本文关键字:拆分 复杂 文件 ruby
  • 更新时间 :
  • 英文 :


我正在运行一个名为Primer 3的命令行程序。它接受一个输入文件并将数据返回到标准输出。我试图写一个Ruby脚本,将接受输入,并把条目散列。

返回的结果如下:我想在'='符号上分割数据,这样has就会像这样:

{:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc }

我还想小写键,即:

 {:sequence_id => "example", :sequence_template => "GTAGTCAGTAGACNAT..etc", :sequence_target => "37,21" etc }

这是我当前的脚本:

#!/usr/bin/ruby
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
  name, height = line.split(/=/)
  primer3[name] = height.to_i
end
puts primer3

返回如下:

Primer 3 hash
{"SEQUENCE_ID"=>0, "SEQUENCE_TEMPLATE"=>0, "SEQUENCE_TARGET"=>37, "PRIMER_TASK"=>0,     "PRIMER_PICK_LEFT_PRIMER"=>1, "PRIMER_PICK_INTERNAL_OLIGO"=>1,  "PRIMER_PICK_RIGHT_PRIMER"=>1, "PRIMER_OPT_SIZE"=>18, "PRIMER_MIN_SIZE"=>15, "PRIMER_MAX_SIZE"=>21, "PRIMER_MAX_NS_ACCEPTED"=>1, "PRIMER_PRODUCT_SIZE_RANGE"=>75, "P3_FILE_FLAG"=>1, "SEQUENCE_INTERNAL_EXCLUDED_REGION"=>37, "PRIMER_EXPLAIN_FLAG"=>1, "PRIMER_THERMODYNAMIC_PARAMETERS_PATH"=>0, "PRIMER_LEFT_EXPLAIN"=>0, "PRIMER_RIGHT_EXPLAIN"=>0, "PRIMER_INTERNAL_EXPLAIN"=>0, "PRIMER_PAIR_EXPLAIN"=>0, "PRIMER_LEFT_NUM_RETURNED"=>0, "PRIMER_RIGHT_NUM_RETURNED"=>0, "PRIMER_INTERNAL_NUM_RETURNED"=>0, "PRIMER_PAIR_NUM_RETURNED"=>0, ""=>0}

数据源

SEQUENCE_ID=example
SEQUENCE_TEMPLATE=GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG
SEQUENCE_TARGET=37,21
PRIMER_TASK=pick_detection_primers
PRIMER_PICK_LEFT_PRIMER=1
PRIMER_PICK_INTERNAL_OLIGO=1
PRIMER_PICK_RIGHT_PRIMER=1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=21
PRIMER_MAX_NS_ACCEPTED=1
PRIMER_PRODUCT_SIZE_RANGE=75-100
P3_FILE_FLAG=1
SEQUENCE_INTERNAL_EXCLUDED_REGION=37,21
PRIMER_EXPLAIN_FLAG=1
PRIMER_THERMODYNAMIC_PARAMETERS_PATH=/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/
PRIMER_LEFT_EXPLAIN=considered 65, too many Ns 17, low tm 48, ok 0
PRIMER_RIGHT_EXPLAIN=considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35
PRIMER_INTERNAL_EXPLAIN=considered 0, ok 0
PRIMER_PAIR_EXPLAIN=considered 0, ok 0
PRIMER_LEFT_NUM_RETURNED=0
PRIMER_RIGHT_NUM_RETURNED=0
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=0
=
$ primer3_core < example2 | ruby /Users/sean/Dropbox/bin/rb/read_primer3.rb

#!/usr/bin/ruby
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
  key, value = line.split(/=/, 2)
  primer3[key.downcase.to_sym] = value.chomp
end
puts primer3

为了好玩,这里有两个纯功能的解决方案。两者都假定您已经从文件中提取了数据,例如

my_data = ARGF.read # read the file passed on the command line

这个感觉有点恶心,但它是一个(长)一行字:)

hash = Hash[ my_data.lines.map{ |line|
  line.chomp.split('=',2).map.with_index{ |s,i| i==0 ? s.downcase.to_sym : s }
} ]

这是两行,但感觉比使用with_index:

更干净。
keys,values = my_data.lines.map{ |line| line.chomp.split('=',2) }.transpose
hash = Hash[ keys.map(&:downcase).map(&:to_sym).zip(values) ]

这两种方法都可能比你已经接受的答案效率更低,而且肯定更需要记忆;迭代行并慢慢改变散列是最好的方法。这些非突变的变异只是一种心理练习。


你的最终答案应该使用ARGF来允许文件名在命令行或通过STDIN。我会这样写:

#!/usr/bin/ruby
module Primer3
  def self.parse( file )
    {}.tap do |primer3|
      # Process one line at a time, without reading it all into memory first
      file.each_line do |line|  
        key, value = line.chomp.split('=', 2)
        primer3[key.downcase.to_sym] = value
      end
    end
  end
end
Primer3.parse( ARGF ) if __FILE__==$0

这样,您可以从命令行调用该文件,带或不带STDIN,或者您可以require此文件并使用它在其他代码中定义的模块函数。

好了,我差不多有了。唯一的问题是在每个值的末尾加上一个n。

puts 'Primer 3 hash'
primer3 = {}
while line = gets do
  key, value = line.split(/=/)
  puts key
  puts value
  primer3[key.downcase] = value
end
puts primer3
{"sequence_id"=>"examplen",  "sequence_template"=>"GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACGn", "sequence_target"=>"37,21n", "primer_task"=>"pick_detection_primersn", "primer_pick_left_primer"=>"1n", "primer_pick_internal_oligo"=>"1n", "primer_pick_right_primer"=>"1n", "primer_opt_size"=>"18n", "primer_min_size"=>"15n", "primer_max_size"=>"21n", "primer_max_ns_accepted"=>"1n", "primer_product_size_range"=>"75-100n", "p3_file_flag"=>"1n", "sequence_internal_excluded_region"=>"37,21n", "primer_explain_flag"=>"1n", "primer_thermodynamic_parameters_path"=>"/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/n", "primer_left_explain"=>"considered 65, too many Ns 17, low tm 48, ok 0n", "primer_right_explain"=>"considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35n", "primer_internal_explain"=>"considered 0, ok 0n", "primer_pair_explain"=>"considered 0, ok 0n", "primer_left_num_returned"=>"0n", "primer_right_num_returned"=>"0n", "primer_internal_num_returned"=>"0n", "primer_pair_num_returned"=>"0n", ""=>"n"}

最新更新