如何查找包含所有元音的同一数组的两个元素



我想迭代一个给定的数组,例如:

["goat", "action", "tear", "impromptu", "tired", "europe"]

我想看看所有可能的对。

所需的输出是一个新数组,其中包含所有对,这些数组组合包含所有元音。此外,这些对应连接为输出数组的一个元素:

["action europe", "tear impromptu"]

我尝试了以下代码,但收到错误消息:

No implicit conversion of nil into string.
def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i|                       # iterate through words
(0..words.length).each do |j|                   # for every word, iterate through words again
pot_pair = words[i].to_s + words[j]         # build string from pair
if check_for_vowels(pot_pair)               # throw string to helper-method.
pairs << words[i] + " " + words[j]      # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end

以下代码旨在提供一种在字数较大时构造所需数组的有效方法。请注意,与其他答案不同,它不使用方法Array#组合。

解释部分的第一部分(如下)概述了算法采用的方法。然后填写详细信息。

法典

require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end

words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]

解释

对于给定的示例,步骤如下。

VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}> 
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
#    #<Set: {"a", "i", "o"}>=>["action"],
#    #<Set: {"e", "a"}>=>["tear", "hear"],
#    #<Set: {"i", "o", "u"}>=>["impromptu"],
#    #<Set: {"i", "e"}>=>["tired"],
#    #<Set: {"e", "u", "o"}>=>["europe"]}

可以看出,h的键是五个元音的子集。这些值是包含键给出的元音的words(单词)元素数组,不包含其他元音。因此,这些值共同构成了words的分区。当单词数量很大时,人们会期望h有 31 个键 (2**5 - 1)。

我们现在遍历h的键值对。对于每一个,用键k和值v,确定缺失元音(vowels_needed)的集合,然后我们循环遍历那些键值对[kk, vv]hkkvowels_needed的超集。然后将vvv元素的所有组合添加到要返回的数组中(经过调整以避免重复计算每对单词)。

继续

enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
#                  #<Set: {"a", "i", "o"}>=>["action"],
#                  ...
#                  #<Set: {"e", "u", "o"}>=>["europe"]}: 
#     each_with_object([])> 

第一个值由enum生成并传递给块,块变量被分配值:

(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]

请参阅枚举器#下一个。

各个变量通过数组分解赋值:

k #=> #<Set: {"o", "a"}> 
v #=> ["goat"] 
a #=> [] 

现在执行块计算。

vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}> 
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end

单词"goat"(v)有元音"o"和"a",所以它只能与包含元音"e","i"和"u"(可能还有"o"和/或"a")的单词匹配。表达式

next unless kk.superset?(vowels_needed)

跳过那些不是vowels_needed超集的h键(kk)。请参阅设置#超集?。

words中的任何单词都不包含"e"、"i"和"u",因此数组a保持不变。

下一个元素现在由enum生成,传递给块,块变量被赋值:

(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []] 
k #=> #<Set: {"a", "i", "o"}> 
v #=> ["action"] 
a #=> [] 

区块计算开始:

vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}> 

我们看到h只有一个键值对,其键是vowels_needed的超集:

kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}> 
vv = ["europe"]

因此,我们执行:

v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}

这将一个元素添加到a

a #=> ["action europe"]

该子句if w1 < w2是为了确保以后在计算中不会"europe action"添加到a中。

如果v(包含"a"、"i"和"u"的单词)和vv(包含"e"、"u"和"o"的单词)改为:

v  #=> ["action", "notification"]
vv #=> ["europe", "route"]

我们会在a中添加"action europe""action route""notification route"。(”europe notification”将在稍后k #=> #<Set: {"e", "u", "o"}时添加。

基准

我根据其他人建议使用@theTinMan的Fruity基准代码对我的方法进行了基准测试。唯一的区别是要测试的单词数组以及将我的方法添加到基准测试中,我将其命名为cary。对于要考虑的单词数组,我从计算机上的英语单词文件中随机选择了 600 个单词:

words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
#    "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]

发现该数组包含 46,436 对包含所有五个元音的单词。

结果如下所示。

compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1

然后,我将caryttm3随机选择的1000个单词进行比较。发现该数组包含 125,068 对包含所有五个元音的单词。结果如下:

Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0

为了感受基准测试的可变性,我又进行了两次最后一次比较,每次都有新的随机选择1000个单词。这给了我以下结果:

Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0

可以看出,样本之间存在相当大的差异。

你说成对,所以我假设它是两个元素的组合。我使用#combination方法对数组中的每个两个元素进行了组合。然后我只#select那些包含所有元音的对,一旦它们连接起来。最后,我确保加入这些对:

["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]

正则表达式来自"什么是正则表达式来匹配包含所有元音的单词?

与Viktor类似,我会使用一个简单的测试来查看单词中存在哪些元音,并在剥离重复项并对其进行排序后与"aeiou"进行比较:

def ttm1(ary)
ary.combination(2).select { |a| 
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]

将其分解,以便您可以看到正在发生的事情。

["goat", "action", "tear", "impromptu", "tired", "europe"]  # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)                                 
.select { |a| a                                 # => ["goat", "action"],        ["goat", "tear"],     ["goat", "impromptu"],     ["goat", "tired"],    ["goat", "europe"],             ["action", "tear"],        ["action", "impromptu"],        ["action", "tired"],       ["action", "europe"],                ["tear", "impromptu"],     ["tear", "tired"],    ["tear", "europe"],             ["impromptu", "tired"],    ["impromptu", "europe"],             ["tired", "europe"]
.join                                       # => "goataction",              "goattear",           "goatimpromptu",           "goattired",          "goateurope",                   "actiontear",              "actionimpromptu",              "actiontired",             "actioneurope",                      "tearimpromptu",           "teartired",          "teareurope",                   "impromptutired",          "impromptueurope",                   "tiredeurope"
.scan(/[aeiou]/)                            # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq                                       # => ["o", "a", "i"],           ["o", "a", "e"],      ["o", "a", "i", "u"],      ["o", "a", "i", "e"], ["o", "a", "e", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "e", "u"],           ["e", "a", "i", "o", "u"], ["e", "a", "i"],      ["e", "a", "u", "o"],           ["i", "o", "u", "e"],      ["i", "o", "u", "e"],                ["i", "e", "u", "o"]
.sort                                       # => ["a", "i", "o"],           ["a", "e", "o"],      ["a", "i", "o", "u"],      ["a", "e", "i", "o"], ["a", "e", "o", "u"],           ["a", "e", "i", "o"],      ["a", "i", "o", "u"],           ["a", "e", "i", "o"],      ["a", "e", "i", "o", "u"],           ["a", "e", "i", "o", "u"], ["a", "e", "i"],      ["a", "e", "o", "u"],           ["e", "i", "o", "u"],      ["e", "i", "o", "u"],                ["e", "i", "o", "u"]
.join == 'aeiou'                            # => false,                     false,                false,                     false,                false,                          false,                     false,                          false,                     true,                                true,                      false,                false,                          false,                     false,                               false
}                                               # => [["action", "europe"], ["tear", "impromptu"]]

查看代码,它正在跳过箍,以查找所有元音是否存在。每次检查时,它都必须通过许多方法来确定是否找到了所有元音;换句话说,它不会短路并失败,直到最后才好。

此代码将:

def ttm2(ary)
ary.combination(2).select { |a| 
str = a.join 
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]

但我不喜欢以这种方式使用正则表达式引擎,因为它比直接查找慢,这会导致:

def ttm3(ary)
ary.combination(2).select { |a| 
str = a.join 
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end

这是基准测试:

require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]                                  
def viktor(ary)
ary.combination(2)                                                                             
.select { |c| c.join('') =~ /b(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+b/ }  
.map{ |w| w.join(' ') }                                                                     
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a| 
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a| 
str = a.join 
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a| 
str = a.join 
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}

结果如下:

# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1

现在,因为这看起来很像家庭作业,所以重要的是要了解学校知道堆栈溢出,并且他们会寻找寻求帮助的学生,所以你可能不想重用这段代码,尤其是不是逐字逐句。

您的代码包含两个错误,其中一个错误导致错误消息。

(0..words.length)从 0 到 6 的循环。 但是words[6]不存在(数组从零开始),所以你得到 nil。用(0..words.length-1)(两次)替换应该可以解决这个问题。

您将获得两次正确的结果,一次是"action europe",一次是"europe action"。这是由于循环太多造成的,每次组合都要循环两次。替换从(0..words.length-1)(i..words.length-1)的第二个循环。

这种繁琐的索引簿记很无聊,经常导致错误。这就是为什么Ruby程序员通常更喜欢更轻松的方法(如其他答案中的combination),完全避免索引。

最新更新