如何查找包含所有元音的同一数组的两个元素

我想迭代一个给定的数组，例如：

["goat", "action", "tear", "impromptu", "tired", "europe"]

我想看看所有可能的对。

所需的输出是一个新数组，其中包含所有对，这些数组组合包含所有元音。此外，这些对应连接为输出数组的一个元素：

["action europe", "tear impromptu"]

我尝试了以下代码，但收到错误消息：

No implicit conversion of nil into string.

def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i|                       # iterate through words
(0..words.length).each do |j|                   # for every word, iterate through words again
pot_pair = words[i].to_s + words[j]         # build string from pair
if check_for_vowels(pot_pair)               # throw string to helper-method.
pairs << words[i] + " " + words[j]      # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end

以下代码旨在提供一种在字数较大时构造所需数组的有效方法。请注意，与其他答案不同，它不使用方法Array#组合。

解释部分的第一部分(如下)概述了算法采用的方法。然后填写详细信息。

法典

require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end

例

words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]

解释

对于给定的示例，步骤如下。

VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}> 
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
#    #<Set: {"a", "i", "o"}>=>["action"],
#    #<Set: {"e", "a"}>=>["tear", "hear"],
#    #<Set: {"i", "o", "u"}>=>["impromptu"],
#    #<Set: {"i", "e"}>=>["tired"],
#    #<Set: {"e", "u", "o"}>=>["europe"]}

可以看出，h的键是五个元音的子集。这些值是包含键给出的元音的words(单词)元素数组，不包含其他元音。因此，这些值共同构成了words的分区。当单词数量很大时，人们会期望h有 31 个键 (2**5 - 1)。

我们现在遍历h的键值对。对于每一个，用键k和值v，确定缺失元音(vowels_needed)的集合，然后我们循环遍历那些键值对[kk, vv]h，kk是vowels_needed的超集。然后将v和vv元素的所有组合添加到要返回的数组中(经过调整以避免重复计算每对单词)。

继续

enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
#                  #<Set: {"a", "i", "o"}>=>["action"],
#                  ...
#                  #<Set: {"e", "u", "o"}>=>["europe"]}: 
#     each_with_object([])>

第一个值由enum生成并传递给块，块变量被分配值：

(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]

请参阅枚举器#下一个。

各个变量通过数组分解赋值：

k #=> #<Set: {"o", "a"}> 
v #=> ["goat"] 
a #=> []

现在执行块计算。

vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}> 
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end

单词"goat"(v)有元音"o"和"a"，所以它只能与包含元音"e"，"i"和"u"(可能还有"o"和/或"a")的单词匹配。表达式

next unless kk.superset?(vowels_needed)

跳过那些不是vowels_needed超集的h键(kk)。请参阅设置#超集？。

words中的任何单词都不包含"e"、"i"和"u"，因此数组a保持不变。

下一个元素现在由enum生成，传递给块，块变量被赋值：

(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []] 
k #=> #<Set: {"a", "i", "o"}> 
v #=> ["action"] 
a #=> []

区块计算开始：

vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}>

我们看到h只有一个键值对，其键是vowels_needed的超集：

kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}> 
vv = ["europe"]

因此，我们执行：

v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}

这将一个元素添加到a：

a #=> ["action europe"]

该子句if w1 < w2是为了确保以后在计算中不会"europe action"添加到a中。

如果v(包含"a"、"i"和"u"的单词)和vv(包含"e"、"u"和"o"的单词)改为：

v  #=> ["action", "notification"]
vv #=> ["europe", "route"]

我们会在a中添加"action europe"、"action route"和"notification route"。(”europe notification”将在稍后k #=> #<Set: {"e", "u", "o"}时添加。

基准

我根据其他人建议使用@theTinMan的Fruity基准代码对我的方法进行了基准测试。唯一的区别是要测试的单词数组以及将我的方法添加到基准测试中，我将其命名为cary。对于要考虑的单词数组，我从计算机上的英语单词文件中随机选择了 600 个单词：

words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
#    "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]

发现该数组包含 46,436 对包含所有五个元音的单词。

结果如下所示。

compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}

Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1

然后，我将cary与ttm3随机选择的1000个单词进行比较。发现该数组包含 125,068 对包含所有五个元音的单词。结果如下：

Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0

为了感受基准测试的可变性，我又进行了两次最后一次比较，每次都有新的随机选择1000个单词。这给了我以下结果：

Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0

Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0

可以看出，样本之间存在相当大的差异。

你说成对，所以我假设它是两个元素的组合。我使用#combination方法对数组中的每个两个元素进行了组合。然后我只#select那些包含所有元音的对，一旦它们连接起来。最后，我确保加入这些对：

["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]

正则表达式来自"什么是正则表达式来匹配包含所有元音的单词？

与Viktor类似，我会使用一个简单的测试来查看单词中存在哪些元音，并在剥离重复项并对其进行排序后与"aeiou"进行比较：

def ttm1(ary)
ary.combination(2).select { |a| 
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]

将其分解，以便您可以看到正在发生的事情。

["goat", "action", "tear", "impromptu", "tired", "europe"]  # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)                                 
.select { |a| a                                 # => ["goat", "action"],        ["goat", "tear"],     ["goat", "impromptu"],     ["goat", "tired"],    ["goat", "europe"],             ["action", "tear"],        ["action", "impromptu"],        ["action", "tired"],       ["action", "europe"],                ["tear", "impromptu"],     ["tear", "tired"],    ["tear", "europe"],             ["impromptu", "tired"],    ["impromptu", "europe"],             ["tired", "europe"]
.join                                       # => "goataction",              "goattear",           "goatimpromptu",           "goattired",          "goateurope",                   "actiontear",              "actionimpromptu",              "actiontired",             "actioneurope",                      "tearimpromptu",           "teartired",          "teareurope",                   "impromptutired",          "impromptueurope",                   "tiredeurope"
.scan(/[aeiou]/)                            # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq                                       # => ["o", "a", "i"],           ["o", "a", "e"],      ["o", "a", "i", "u"],      ["o", "a", "i", "e"], ["o", "a", "e", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "e", "u"],           ["e", "a", "i", "o", "u"], ["e", "a", "i"],      ["e", "a", "u", "o"],           ["i", "o", "u", "e"],      ["i", "o", "u", "e"],                ["i", "e", "u", "o"]
.sort                                       # => ["a", "i", "o"],           ["a", "e", "o"],      ["a", "i", "o", "u"],      ["a", "e", "i", "o"], ["a", "e", "o", "u"],           ["a", "e", "i", "o"],      ["a", "i", "o", "u"],           ["a", "e", "i", "o"],      ["a", "e", "i", "o", "u"],           ["a", "e", "i", "o", "u"], ["a", "e", "i"],      ["a", "e", "o", "u"],           ["e", "i", "o", "u"],      ["e", "i", "o", "u"],                ["e", "i", "o", "u"]
.join == 'aeiou'                            # => false,                     false,                false,                     false,                false,                          false,                     false,                          false,                     true,                                true,                      false,                false,                          false,                     false,                               false
}                                               # => [["action", "europe"], ["tear", "impromptu"]]

查看代码，它正在跳过箍，以查找所有元音是否存在。每次检查时，它都必须通过许多方法来确定是否找到了所有元音;换句话说，它不会短路并失败，直到最后才好。

此代码将：

def ttm2(ary)
ary.combination(2).select { |a| 
str = a.join 
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]

但我不喜欢以这种方式使用正则表达式引擎，因为它比直接查找慢，这会导致：

def ttm3(ary)
ary.combination(2).select { |a| 
str = a.join 
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end

这是基准测试：

require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]                                  
def viktor(ary)
ary.combination(2)                                                                             
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }  
.map{ |w| w.join(' ') }                                                                     
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a| 
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a| 
str = a.join 
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a| 
str = a.join 
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}

结果如下：

# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1

现在，因为这看起来很像家庭作业，所以重要的是要了解学校知道堆栈溢出，并且他们会寻找寻求帮助的学生，所以你可能不想重用这段代码，尤其是不是逐字逐句。

您的代码包含两个错误，其中一个错误导致错误消息。

(0..words.length)从 0 到 6 的循环。但是words[6]不存在(数组从零开始)，所以你得到 nil。用(0..words.length-1)(两次)替换应该可以解决这个问题。

您将获得两次正确的结果，一次是"action europe"，一次是"europe action"。这是由于循环太多造成的，每次组合都要循环两次。替换从(0..words.length-1)到(i..words.length-1)的第二个循环。

这种繁琐的索引簿记很无聊，经常导致错误。这就是为什么Ruby程序员通常更喜欢更轻松的方法(如其他答案中的combination)，完全避免索引。

相关内容

最新更新

热门标签：