Ruby/Rails文本解析为对象



我正试图从下面的文本(.srt字幕文件)中的每个重复集创建对象:

1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.
2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.
3
00:02:18,076 --> 00:02:19,744
They're following us.

例如,我可以取三行或四行,并将它们指定给新对象的属性。所以对于第一套,我可以有Sentence.create(number: 1, time_marker: '00:02:12', content: "The Hovitos are near.")

script.each_line开始,还有什么其他的一般结构可以让我走上正轨?我在这方面遇到了困难,任何帮助都将是美妙的!

编辑

到目前为止,我所拥有的一些混乱的未完成代码如下。它确实有效(我认为)。你会选择完全不同的路线吗?我对此没有任何经验。

number = nil
time_marker = nil
content = []
script = script.strip
script.each_line do |line|
  line = line.strip
  if line =~ /^d+$/
    number = line.to_i
  elsif line =~ /-->/
    time_marker = line[0..7]
  elsif line =~ /^bD/
    content << line
  else
    if content.size > 1
      content = content.join("n") 
    else
      content = content[0]
    end
    Sentence.create(movie: @movie, number: number, 
      time_marker: time_marker, content: content)
    content = []
  end
end

这里有一种方法:

File.read('subtitles.srt').split(/^s*$/).each do |entry| # Read in the entire text and split on empty lines
  sentence = entry.strip.split("n")
  number = sentence[0] # First element after empty line is 'number'
  time_marker =  sentence[1][0..7] # Second element is 'time_marker'
  content = sentence[2..-1].join("n") # Everything after that is 'content'
end

假设字幕在以下变量中:

subtitles = %q{1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.
2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.
3
00:02:18,076 --> 00:02:19,744
They're following us.}

然后,你可以这样做:

def split_subs subtitles
  grouped, splitted = [], []
  subtitles.split("n").push("n").each do |sub|
    if sub.strip.empty?
      splitted.push({
        number: grouped[0],
        time_marker: grouped[1].split(",").first,
        content: grouped[2..-1].join(" ")
      })
      grouped = []
    else
      grouped.push sub.strip
    end
  end
  splitted
end
puts split_subs(subtitles)
# output:
# ➲ ruby 23025546.rb                                  [10:00:07] ▸▸▸▸▸▸▸▸▸▸
# {:number=>"1", :time_marker=>"00:02:12", :content=>"The Hovitos are near."}
# {:number=>"2", :time_marker=>"00:02:15", :content=>"The poison is still fresh, three days."}
# {:number=>"3", :time_marker=>"00:02:18", :content=>"They're following us."}

相关内容

最新更新