如何在红宝石中批处理枚举项



在我了解 Rubyenumerable的过程中,我有类似于以下内容的内容

FileReader.read(very_big_file)
.lazy
.flat_map {|line| get_array_of_similar_words } # array.size is ~10
.each_slice(100) # wait for 100 items
.map{|array| process_100_items}

尽管每个flat_map调用都会发出一个包含 ~10 个项目的数组,但我期待each_slice调用以 100 个项目进行批处理,但事实并非如此。即等到有 100 个项目后再将它们传递给最终的.map调用。

如何在响应式编程中实现类似于缓冲函数的功能?

要了解lazy如何影响计算,让我们看一个例子。首先构造一个文件:

str =<<~_
Now is the
time for all
good Ruby coders
to come to
the aid of
their bowling
team
_
fname = 't' 
File.write(fname, str)
#=> 82

并指定切片大小:

slice_size = 4

现在我将逐行阅读,将行拆分为单词,删除重复的单词,然后将这些单词附加到数组中。一旦数组包含至少 4 个单词,我将选取前四个并将它们映射到 4 个单词中最长的单词中。要执行此操作的代码如下。为了显示计算进度,我将用puts语句加盐代码。请注意,不带块的 IO::foreach 返回一个枚举器。

IO.foreach(fname).
lazy.
tap { |o| puts "o1 = #{o}" }.
flat_map { |line|
puts "line = #{line}"
puts "line.split.uniq = #{line.split.uniq} "
line.split.uniq }.
tap { |o| puts "o2 = #{o}" }.
each_slice(slice_size).
tap { |o| puts "o3 = #{o}" }.
map { |arr|
puts "arr = #{arr}, arr.max = #{arr.max_by(&:size)}"
arr.max_by(&:size) }.
tap { |o| puts "o3 = #{o}" }.
to_a
#=> ["time", "good", "coders", "bowling", "team"] 

将显示以下内容:

o1 = #<Enumerator::Lazy:0x00005992b1ab6970>
o2 = #<Enumerator::Lazy:0x00005992b1ab6880>
o3 = #<Enumerator::Lazy:0x00005992b1ab6678>
o3 = #<Enumerator::Lazy:0x00005992b1ab6420>
line = Now is the
line.split.uniq = ["Now", "is", "the"] 
line = time for all
line.split.uniq = ["time", "for", "all"] 
arr = ["Now", "is", "the", "time"], arr.max = time
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"] 
arr = ["for", "all", "good", "Ruby"], arr.max = good
line = to come to
line.split.uniq = ["to", "come"] 
line = the aid of
line.split.uniq = ["the", "aid", "of"] 
arr = ["coders", "to", "come", "the"], arr.max = coders
line = their bowling
line.split.uniq = ["their", "bowling"] 
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
line = team
line.split.uniq = ["team"] 
arr = ["team"], arr.max = team

如果删除行lazy.,则返回值相同,但显示以下内容(.to_a末尾现在是多余的(:

o1 = #<Enumerator:0x00005992b1a438f8>
line = Now is the
line.split.uniq = ["Now", "is", "the"] 
line = time for all
line.split.uniq = ["time", "for", "all"] 
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"] 
line = to come to
line.split.uniq = ["to", "come"] 
line = the aid of
line.split.uniq = ["the", "aid", "of"] 
line = their bowling
line.split.uniq = ["their", "bowling"] 
line = team
line.split.uniq = ["team"] 
o2 = ["Now", "is", "the", "time", "for", "all", "good", "Ruby",
"coders", "to", "come", "the", "aid", "of", "their",
"bowling", "team"]
o3 = #<Enumerator:0x00005992b1a41a08>
arr = ["Now", "is", "the", "time"], arr.max = time
arr = ["for", "all", "good", "Ruby"], arr.max = good
arr = ["coders", "to", "come", "the"], arr.max = coders
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
arr = ["team"], arr.max = team
o3 = ["time", "good", "coders", "bowling", "team"]

最新更新