在 Ruby 脚本中使用 SLIM/HAML 等



我目前正在制作一个脚本,用于分析一些遗传数据,然后在彩色Word文档上生成输出。脚本有效,但是,脚本中的一种方法写得不好,即创建Word文档的方法。

创建文档的方法创建一个独立的HTML文件,然后以"docx"扩展名保存,这允许我为文档的不同部分提供不同的样式。

以下是使其正常工作的最低要求。它包括一些示例输入数据,这些数据将在最后一步之前以不同的方法创建并存储在哈希中,以及必要的方法。

require 'bio'
def make_hash(input_file)
  input_read = Hash.new
  biofastafile = Bio::FlatFile.open(Bio::FastaFormat, input_file) 
  biofastafile.each_entry do |entry|
    input_read[entry.definition] = entry.aaseq
  end
  return input_read
end
def to_doc(hash, output, motif)
  output_file = File.new(output, "w")
  output_file.puts "<!DOCTYPE html><html><head><style> .id{font-weight: bold;} .signalp{color:#000099; font-weight: bold;} .motif{color:#FF3300; font-weight: bold;} h3 {word-wrap: break-word;} p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}</style></head><body>"
  hash.each do |id, seq|
    sequence = seq.to_s.gsub("["", "").gsub(""]", "")
    id.scan(/(w+)(.*)/) do |id_start, id_end|
      output_file.puts "<p><span class="id"> >#{id_start}</span><span>#{id_end}</span><br>"
      output_file.puts "<span class="signalp">"
      sequence.scan(/(w+)-(w+)/) do |signalp, seq_end|
        output_file.puts signalp + "</span>" + seq_end.gsub(/#{motif}/, '<span class="motif"></span>')
        output_file.puts "</p>"
      end
    end
  end
  output_file.puts "</body></html>"
  output_file.close   
end
hash = make_hash("./sample.txt")
to_doc = to_doc(hash, "output.docx", "WL|KK|RR|KR|R..R|R....R"

这是一些示例数据。实际上,在分析一个物种的遗传数据时,这可以由许多100,000个序列组成:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL

每个读取由两部分组成:seq id(以>开头的行)和序列。这是拆分的,并存储在 make_hash 方法的哈希中。此示例:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL 

由以下部分组成:

>isotig00001_f4_14  (the first part of the id - class="id")
Signal P Cleavage Site => 11:12 (the second part of the id - normal writing)
(new line)
MMHLLCIVLLL (first part of the sequence - class="signalp")
KW WL LL  (the second part of the sequence - the motif KW will be class="motif")

在 HTML 中,它将产生:

<p>
  <span class="id"> >isotig00001_f4_14</span><span>Signal P Cleavage Site => 11:12</span>
<br>
  <span class="signalp">MMHLLCIVLL</span><span>KW</span><span class="motif">KW</span><span>LL</span>

基本上,我想使用适当的HTML模板脚本(例如SLIM/HAML/NOKOGIRI/ERB)重写to_doc方法。我试图完成这项工作。

出于某种原因,循环中的循环不起作用,创建全局变量来存储这些变量也不起作用。

上面的脚本有效,只需将示例数据另存为"sample.txt",然后运行脚本即可。

我将非常感谢任何帮助。

这是一个起点:

require 'haml'
haml_doc = <<EOT
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
EOT
engine = Haml::Engine.new(haml_doc)
puts engine.render

运行时输出以下内容:

<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
</html>

从那里,您可以使用以下方法轻松写入文件:

File.write(output, engine.render)

而不是使用 puts 将其输出到控制台。

要使用它,您需要使用额外的 Haml 来充实haml_doc,以循环输入数据并将其调整为可以干净迭代的数组或哈希,而无需嵌入各种scan和条件逻辑。视图应主要用于输出内容,而不是操作数据。

engine = Haml...行的正上方,您需要读取输入数据并对其进行按摩,并将其存储在 Haml 可以迭代的实例变量中。您在原始代码中具有基本思想,但不是尝试输出 HTML,而是创建一个可以传递给 Haml 的对象或子哈希。

通常,这些都会被分成模型、视图和控制器的单独文件,就像在 Rails 或大型 Sinatra 应用程序中一样,但这实际上不是一个大应用程序,所以你可以把它们放在一个文件中。保持你的逻辑干净,它会没事的。

如果没有示例输入数据和预期的输出,很难做更多的事情,但这将为您提供一个起点。


根据数据样本,这里有一些东西可以让你大致了解。我不会润色它,因为毕竟你必须做一些,但这是一个合理的开始。第一部分是模拟一些合理的东西,比如你在代码中引用的生物,但我从未见过。您不需要此部分,但可能需要浏览它:

module Bio
  FastaFormat = 1
SAMPLE_DATA = <<-EOT
>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
EOT
  class FlatFile
    class Entry
      attr_reader :definition, :aaseq
      def initialize(definition, aaseq)
        @definition = definition
        @aaseq = aaseq
      end
    end
    def initialize
    end
    def self.open(filetype, filename)
      SAMPLE_DATA.split("n").each_slice(2).map{ |seq_id, sequence| Entry.new(seq_id, sequence) }
    end
    def each_entry
      @sample_data.each do |_entry|
        yield _entry
      end
    end
  end
end

这就是乐趣的开始。我修改了您的get_hash例程以解析字符串。它返回的不是哈希,而是哈希数组。每个子哈希都准备好使用,换句话说,数据被解析并准备输出:

include Bio
def make_array_of_hashes(input_file)
  Bio::FlatFile.open(
    Bio::FastaFormat,
    input_file
  ).map { |entry|
    id_start, id_end = entry.definition.split('-').map(&:strip)
    signalp, seq_end = entry.aaseq.split('-')
    motif = seq_end.scan(/(?:WL|KK|RR|KR|R..R|R....R)/)
    {
      :id_start => id_start,
      :id_end => id_end,
      :signalp => signalp,
      :motif => motif
    }
  }
end

这是在脚本正文中定义 HAML 文档的简单方法。我只输出,模板中除了循环之外没有逻辑。其他所有内容都在处理视图之前处理:

haml_doc = <<EOT
!!!
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
  - data.each do |d|
    %p
      %span.id= d[:id_start]
      %span= d[:id_end]
      %br/
      %span.signalp= d[:signalp]
      - d[:motif].each do |m|
        %span= m
EOT

以下是使用它的方法:

require 'haml'
data = make_array_of_hashes('sample.txt')
engine = Haml::Engine.new(haml_doc)
puts engine.render(Object.new, :data => data)

其中,当运行输出时:

<!DOCTYPE html>
<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
  <p>
    <span class='id'>>isotig00001_f4_14</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00001_f4_15</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f2_3</span>
    <span>Signal P Cleavage Site => 22:23</span>
    <br>
    <span class='signalp'>MLKCFSIIMGLILLLEIGGGCA</span>
    <span>KR</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f3_9</span>
    <span>Signal P Cleavage Site => 16:17</span>
    <br>
    <span class='signalp'>MKTGIIIFISTVVVLP</span>
    <span>KR</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_13</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_14</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
</html>

相关内容

  • 没有找到相关文章

最新更新