我想迭代一个给定的数组,例如:
["goat", "action", "tear", "impromptu", "tired", "europe"]
我想看看所有可能的对。
所需的输出是一个新数组,其中包含所有对,这些数组组合包含所有元音。此外,这些对应连接为输出数组的一个元素:
["action europe", "tear impromptu"]
我尝试了以下代码,但收到错误消息:
No implicit conversion of nil into string.
def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i| # iterate through words
(0..words.length).each do |j| # for every word, iterate through words again
pot_pair = words[i].to_s + words[j] # build string from pair
if check_for_vowels(pot_pair) # throw string to helper-method.
pairs << words[i] + " " + words[j] # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end
以下代码旨在提供一种在字数较大时构造所需数组的有效方法。请注意,与其他答案不同,它不使用方法Array#组合。
解释部分的第一部分(如下)概述了算法采用的方法。然后填写详细信息。
法典
require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end
例
words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]
解释
对于给定的示例,步骤如下。
VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}>
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# #<Set: {"e", "a"}>=>["tear", "hear"],
# #<Set: {"i", "o", "u"}>=>["impromptu"],
# #<Set: {"i", "e"}>=>["tired"],
# #<Set: {"e", "u", "o"}>=>["europe"]}
可以看出,h
的键是五个元音的子集。这些值是包含键给出的元音的words
(单词)元素数组,不包含其他元音。因此,这些值共同构成了words
的分区。当单词数量很大时,人们会期望h
有 31 个键 (2**5 - 1
)。
我们现在遍历h
的键值对。对于每一个,用键k
和值v
,确定缺失元音(vowels_needed
)的集合,然后我们循环遍历那些键值对[kk, vv]
h
,kk
是vowels_needed
的超集。然后将v
和vv
元素的所有组合添加到要返回的数组中(经过调整以避免重复计算每对单词)。
继续
enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# ...
# #<Set: {"e", "u", "o"}>=>["europe"]}:
# each_with_object([])>
第一个值由enum
生成并传递给块,块变量被分配值:
(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]
请参阅枚举器#下一个。
各个变量通过数组分解赋值:
k #=> #<Set: {"o", "a"}>
v #=> ["goat"]
a #=> []
现在执行块计算。
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}>
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
单词"goat"(v
)有元音"o"和"a",所以它只能与包含元音"e","i"和"u"(可能还有"o"和/或"a")的单词匹配。表达式
next unless kk.superset?(vowels_needed)
跳过那些不是vowels_needed
超集的h
键(kk
)。请参阅设置#超集?。
words
中的任何单词都不包含"e"、"i"和"u",因此数组a
保持不变。
下一个元素现在由enum
生成,传递给块,块变量被赋值:
(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []]
k #=> #<Set: {"a", "i", "o"}>
v #=> ["action"]
a #=> []
区块计算开始:
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}>
我们看到h
只有一个键值对,其键是vowels_needed
的超集:
kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}>
vv = ["europe"]
因此,我们执行:
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
这将一个元素添加到a
:
a #=> ["action europe"]
该子句if w1 < w2
是为了确保以后在计算中不会"europe action"
添加到a
中。
如果v
(包含"a"、"i"和"u"的单词)和vv
(包含"e"、"u"和"o"的单词)改为:
v #=> ["action", "notification"]
vv #=> ["europe", "route"]
我们会在a
中添加"action europe"
、"action route"
和"notification route"
。(”europe notification”
将在稍后k #=> #<Set: {"e", "u", "o"}
时添加。
基准
我根据其他人建议使用@theTinMan的Fruity基准代码对我的方法进行了基准测试。唯一的区别是要测试的单词数组以及将我的方法添加到基准测试中,我将其命名为cary
。对于要考虑的单词数组,我从计算机上的英语单词文件中随机选择了 600 个单词:
words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
# "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]
发现该数组包含 46,436 对包含所有五个元音的单词。
结果如下所示。
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1
然后,我将cary
与ttm3
随机选择的1000个单词进行比较。发现该数组包含 125,068 对包含所有五个元音的单词。结果如下:
Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0
为了感受基准测试的可变性,我又进行了两次最后一次比较,每次都有新的随机选择1000个单词。这给了我以下结果:
Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0
可以看出,样本之间存在相当大的差异。
你说成对,所以我假设它是两个元素的组合。我使用#combination
方法对数组中的每个两个元素进行了组合。然后我只#select
那些包含所有元音的对,一旦它们连接起来。最后,我确保加入这些对:
["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]
正则表达式来自"什么是正则表达式来匹配包含所有元音的单词?
与Viktor类似,我会使用一个简单的测试来查看单词中存在哪些元音,并在剥离重复项并对其进行排序后与"aeiou"
进行比较:
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
将其分解,以便您可以看到正在发生的事情。
["goat", "action", "tear", "impromptu", "tired", "europe"] # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |a| a # => ["goat", "action"], ["goat", "tear"], ["goat", "impromptu"], ["goat", "tired"], ["goat", "europe"], ["action", "tear"], ["action", "impromptu"], ["action", "tired"], ["action", "europe"], ["tear", "impromptu"], ["tear", "tired"], ["tear", "europe"], ["impromptu", "tired"], ["impromptu", "europe"], ["tired", "europe"]
.join # => "goataction", "goattear", "goatimpromptu", "goattired", "goateurope", "actiontear", "actionimpromptu", "actiontired", "actioneurope", "tearimpromptu", "teartired", "teareurope", "impromptutired", "impromptueurope", "tiredeurope"
.scan(/[aeiou]/) # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq # => ["o", "a", "i"], ["o", "a", "e"], ["o", "a", "i", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "e", "u"], ["e", "a", "i", "o", "u"], ["e", "a", "i"], ["e", "a", "u", "o"], ["i", "o", "u", "e"], ["i", "o", "u", "e"], ["i", "e", "u", "o"]
.sort # => ["a", "i", "o"], ["a", "e", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "o", "u"], ["a", "e", "i", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "i", "o", "u"], ["a", "e", "i", "o", "u"], ["a", "e", "i"], ["a", "e", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"]
.join == 'aeiou' # => false, false, false, false, false, false, false, false, true, true, false, false, false, false, false
} # => [["action", "europe"], ["tear", "impromptu"]]
查看代码,它正在跳过箍,以查找所有元音是否存在。每次检查时,它都必须通过许多方法来确定是否找到了所有元音;换句话说,它不会短路并失败,直到最后才好。
此代码将:
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
但我不喜欢以这种方式使用正则表达式引擎,因为它比直接查找慢,这会导致:
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
这是基准测试:
require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]
def viktor(ary)
ary.combination(2)
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }
.map{ |w| w.join(' ') }
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}
结果如下:
# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1
现在,因为这看起来很像家庭作业,所以重要的是要了解学校知道堆栈溢出,并且他们会寻找寻求帮助的学生,所以你可能不想重用这段代码,尤其是不是逐字逐句。
您的代码包含两个错误,其中一个错误导致错误消息。
(0..words.length)
从 0 到 6 的循环。 但是words[6]
不存在(数组从零开始),所以你得到 nil。用(0..words.length-1)
(两次)替换应该可以解决这个问题。
您将获得两次正确的结果,一次是"action europe"
,一次是"europe action"
。这是由于循环太多造成的,每次组合都要循环两次。替换从(0..words.length-1)
到(i..words.length-1)
的第二个循环。
这种繁琐的索引簿记很无聊,经常导致错误。这就是为什么Ruby程序员通常更喜欢更轻松的方法(如其他答案中的combination
),完全避免索引。