所以我让这种方法工作,这样它在数据集很小时就可以工作。 然而,当它变得更大一点时......
此脚本的目的是找到每个可能的集合组合,而不会重复。 这样我就可以将它们存储在数据库表中。
set 1: [701,744,410,646,723,434]
set 2: [701,744,410,646,723,435]
set 3: etc..
我还应该注意,我需要保持与原始密钥的关系。 因此,类型 1 中的项目不能移动到任何其他类型。 希望这是有道理的。
Collecting pieces...
pieces[type1] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
pieces[type2] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
pieces[type3] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
pieces[type4] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
pieces[type5] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
pieces[type6] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
Took 0.4265 seconds to collect.
Generating possibilities...
/Projects/my_project/lib/tasks/possibilities.rake:109: [BUG] Segmentation fault
ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-darwin12.2.0]
是的,段错误。
这是我用来实现它的代码。
def permutations!(input)
permutations_start = Time.now
puts "Generating possibilities..."
input.each do |key, possibilities|
possibilities.map!{|p| {key => p} }
end
digits = input.keys.map!{|key| input[key] }
# This is the line that seems to want to cry.
result = digits.shift.product(*digits)
puts "# of generated possibilities: #{result.length}"
puts "Took #{(Time.now - permutations_start).round(4)} seconds to generate.nn"
return result
end
pieces = {}
pieces['type1'] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
pieces['type2'] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
pieces['type3'] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
pieces['type4'] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
pieces['type5'] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
pieces['type6'] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
possibilities = permutations!(pieces)
就内存而言,它看起来还可以。 CPU 就像以前一样固定,尽管我预料到了这一点。
现在的大部分时间是将记录存储在数据库中。 我希望我可以使用 activerecord-import 或批量插入来更快地完成它,但我必须在保存组之前对它们进行计算。 所以我把它设置为一个before_save钩子,这样它就会在模型中得到处理。
按照目前的速度,获取数据库中的所有数据大约需要几个月的时间。
def generate(input)
input.each do |key, possibilities|
possibilities.map!{|p| {key => p} }
end
digits = input.keys.map!{ |key| input[key] }
i = 1
shifted = digits.shift
shifted.each do |item|
puts "Generating groups #{i} of #{shifted.length}..."
permutations_start = Time.now
results = [item].product(*digits)
puts "# of generated groups in the set number - #{i}: #{results.length}"
puts "Took #{(Time.now - permutations_start).round(4)} seconds to generate.nn"
# Storing the groups
puts "Storing groups..."
storing_start = Time.now
results.each { |item| Group.create!(item.reduce({}, :update)) }
puts "Took #{(Time.now - storing_start).round(4)} seconds to store.nn"
i = i + 1
end
end
示例输出:
Collecting pieces...
possibilities['type1'] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
possibilities['type2'] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
possibilities['type3'] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
possibilities['type4'] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
possibilities['type5'] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
possibilities['type6'] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
Took 0.4248 seconds to collect.
Generating groups 1 of 22...
There were 4,919,376 groups in the set number 1.
Took 1.819 seconds to generate.
Storing Groups...
250 items took 11.7158 seconds
250 items took 11.5094 seconds
250 items took 11.6994 seconds
250 items took 11.5678 seconds
250 items took 11.5529 seconds