我在实习中遇到了一些问题。我需要从数组中获取所有序列：假设我有这个数组[1,2,3,4,5]，我需要生成所有连续的子数组，如[1]、[2]、[3]、[4]、[5]、[1,2]、[2,3]、[3,4]、[4,5]、[1,2,3]、[2,3,4]、[3,4,5]、[1,2,3,4]、[2,3,4,5]、[1,2,3,4,5]。

在此之后，我需要从生成的数组中删除一些子数组。我称之为"不允许"。

例如，如果不允许的序列[1,3]并且[3,5]我需要删除序列[1,2,3]([1,3]在其中)，[3,4,5]([3,5]是子数组)，[1,2,3,4]([1,3]iis是子数组)，[2,3,4,5]([3,5]是子数组)和[1,2,3,4,5]([1,3]和[3,5]是子数组)。

我做了一些工作的代码，问题是：大数组(n可以10^5)，代码很慢(如果你尝试使用n = 1000你会看到)。我的代码如下。不允许的是从[a[i], b[i]]生成的。

require 'set'
def not_allowed(a, b)
count = a.size - 1
(0..count).map { |i| [a[i], b[i]] }
end
def combinations(n)
combinations = []
elements = (1..n).to_a
elements_sequence = (0..elements.size - 1)
elements_sequence.each do |i|
elements_sequence.each do |j|
next if elements[j..i].size == 0
combinations << elements[j..i]
end
end
combinations
end
def adjustCombinations(n, a, b)
sequences = combinations(n)
not_alloweds = not_allowed(a, b).map { |not_allowed| not_allowed if not_allowed[0] != not_allowed[1] }.compact
final_not_alloweds = not_alloweds.map { |not_allowed|  sequences.map { |sequence| sequence if not_allowed.to_set.subset?(sequence.to_set) }.compact }.flatten(1)
(sequences - final_not_alloweds).count
end
adjustCombinations(5, [2,1,2], [2,3,5])

在此示例中，不允许使用[1,3]和[2,5]。当不允许时，像[1,1]、[2,2]，[3,3]这不是问题。

n是确定数组直到5。

更新1：我有9~10秒的限制来运行此代码

更新 2：对于not_alloweds [1,3] 和 [2,5]，我们需要删除序列 [1,2,3]、[1,2,3,4] [2,3,4,5]、[1,2,3,4,5](如果您查看not_alloweds是所列序列的子序列)

更新3：我需要可能的allowed_sequences数量。在以下情况下：

n = 5
a = [2,1,2]
b = [2,3,5]

预期结果为11

而不是创建所有组合，然后删除不需要的内容。只创建您实际需要的内容可能会更快。

例如，您可以向combinations方法添加一个块，以确定是否将组合添加到输出中。这样，您就不会生成以后需要减去的不必要的长数组。

require 'set'
def combinations(n)
combinations = []
elements = (1..n).to_a
elements.each_index do |i|
(elements.length - i).times do |j|
combination = elements[j, i + 1]
# add the combination if no block is given, or if it evaluates to truthy
combinations << combination if !block_given? || yield(combination)
end
end
combinations
end
def adjustCombinations(n, a, b)
not_allowed = [a, b].transpose
not_allowed.reject! { |n1, n2| n1 == n2 }
not_allowed.map!(&:to_set)
combinations(n) do |combination|
combination = combination.to_set
not_allowed.reject { |set| set.size > combination.size }
.none?  { |set| (set - combination).empty? }
end
end
adjustCombinations(5, [2,1,2], [2,3,5])
#=> [[1], [2], [3], [4], [5], [1, 2], [2, 3], [3, 4], [4, 5], [2, 3, 4], [3, 4, 5]]

如果只需要元素计数，则可以将combinations = []替换为combinations = 0，combinations << combination替换为combinations += 1。这样，您就不需要实例化整个组合数组，从而节省大量内存和较大的数字。

OP 定义了 2 元素"不允许"数组，[i,j]、j >= i。我发现提及"不被覆盖"(NTBC)范围更具描述性，i..j，j >= i。给定一个正整数n和一个 NTBC 范围的数组arr，目标是确定未覆盖任何 NTBC 范围的1..n所涵盖的范围数;也就是说，计算以下i..j范围：

arr.any? { |r| (i..j).cover? r } == false

在这里和下面，每当我提到一个范围i..j时，都会假设i和j满足条件1 <= i <= j <= n。

我将两个 NTBC 范围称为重叠i..j和p..q，如果i < p < q < j或p < i < q < j。下面，我为没有重叠的NTBC范围的情况提供了一个非常快速的解决方案。OP已经指出NTBC范围可以重叠，所以我的解决方案没有完整的答案。问题在于，重叠范围的存在破坏了我采用的组合算法所需的数学结构。尽管如此，我还是决定提出我的解决方案，希望有人可以找到一种方法将我的想法扩展到 NTBC 范围可能重叠的情况。

创建count_ranges方法

让我们首先创建一个方法，给定整数i和j，i <= j，计算p..q范围的数量，使得i <= p <= q <= j;也就是说，(i..j).cover?(p,q) #=> true哪个。例如

(20..40).cover?(12..18) #=> false
(20..40).cover?(14..28) #=> false
(20..40).cover?(23..37) #=> true
(20..40).cover?(29..47) #=> false
(20..40).cover?(43..51) #=> false

请参阅范围#封面？对于参数为范围的情况。方法如下。

def count_ranges(i,j)
return 0 if j < i
n = (j-i+1)
n**2 - (n-1)*n/2
end

例如：

count_ranges(1,4)   #=> 10

这些将是10范围：

1..1, 1..2, 1..3, 1..4,
2..2, 2..3, 2..4,
3..3, 3..4,
4..4

请注意：

count_ranges(3,6)   #=> 10

或者，更一般地说，

count_ranges(n,n+3) #=> 10

通过此方法执行计算的基本原理如下。给定范围i..j，i <= j，这个范围所覆盖的范围数计算如下。

n = j-i+1
n + n-1 + n-2 + ... + n-(n-1)
= n*n - (1 + 2 +...+ n-1)
= n*n - (n-1)*n/2

n等于从i开始到结束于k的范围数，其中i <= k <= j。n-1等于从i+1开始，到k、i+1 <= k <= j结束的范围数，依此类推。n-(n-1) #=> 1等于从i+(n-1) #=> i+(j-i+1-1) => j开始到结束于k的范围数，j <= k <= j。

(1 + 2 +...+ n-1)是算术序列的总和，等于序列中元素数(n-1)和序列中的平均值的乘积，等于第一个和最后一个元素(1+(n-1) #=> n)之和除以2。

此方法完成以下代码中的所有工作。需要注意的是，此方法的执行速度几乎与其两个参数的大小无关。

删除多余的 NTBC 范围

让我们arrNTBC 范围的数组。第一步是删除覆盖另一个 NTBC 范围的 NTBC 范围。如果：

r1 = u..v
r2 = x..w

和u <= x =< w <= v(r1.cover?(r2) #=> true)，r1是多余的，因此可以忽略，因为任何范围r被取消计数资格，因为如果r1不在场，r.cover?(r1) #=> true将被r2取消资格。

require 'set'
def remove_redundant_ranges(arr)
set = Set.new
a = arr.dup
while a.any?
r = a.shift
set << r unless set.any? { |t| r.cover?(t) } || a.any? { |t| r.cover?(t) }
end
set.to_a  
end

例如：

arr = [2..4, 7..14, 8..12, 14..14, 4..5, 1..5, 9..12, 2..4]
remove_redundant_ranges(arr) 
#=> [14..14, 4..5, 9..12, 2..4]

计数方式

def count_non_covering_ranges(n, arr)
(([0..0] + remove_redundant_ranges(arr).sort_by(&:begin)) << (n+1..n+1)).
each_cons(2).
sum { |r1,r2| net_count(r1, r2) }
end
def net_count(r1, r2)
r1b, r1e = endpoints(r1)
r2b, r2e = endpoints(r2)
count_ranges(r1b+1, r2e-1) - count_ranges(r1b+1, r1e-1)
end
def endpoints(r)
[r.begin, r.end]
end

例子

例 1

n = 9
arr = [2..4, 4..5, 7..9]

count_non_covering_ranges(n, arr)
#=> 20

1..9涵盖的20范围不包括arr中的任何NTBC范围如下。

1..1, 1..2, 1..3,
2..2, 2..3,
3..3, 3..4,
4..4,
5..5, 5..6, 5..7, 5..8,
6..6, 6..7, 6..8,
7..7, 7..8,
8..8, 8..9
9..9

例 2

n = 12
arr = [2..4, 4..7, 9..11]

count_non_covering_ranges(n, arr)
#=> 38

1..12涵盖的38范围不包括arr中的任何NTBC范围如下。

1..1, 1..2, 1..3,
2..2, 2..3,
3..3, 3..4, 3..5, 3..6,
4..4, 4..5, 4..6,
5..5, 5..6, 5..7, 5..8, 5..9,  5..10,
6..6, 6..7, 6..8, 6..9,  6..10,
7..7, 7..8, 7..9,  7..10,
8..8, 8..9,  8..10,
9..9,  9..10,
10..10, 10..11, 10..12,
11..11, 11..12,
12..12

例 3

n = 12
arr = [2..4, 6..8, 9..11]

count_non_covering_ranges(n, arr)
#=> 34

1..12涵盖的34范围不包括arr中的任何NTBC范围如下。

1..1, 1..2, 1..3,
2..2, 2..3,
3..3, 3..4, 3..5, 3..6, 3..7
4..4, 4..5, 4..6, 4..7
5..5, 5..6, 5..7,
6..6, 6..7,
7..7, 7..8, 7..9,  7..10,
8..8, 8..9,  8..10, 
9..9,  9..10, 
10..10, 10..11, 10..12,
11..11, 11..12,
12..12

例 4

n = 15
arr = [2..4, 4..5, 9..12, 14..14]

count_non_covering_ranges(n, arr)
#=> 44

1..15涵盖的44范围不包括arr中的任何NTBC范围如下。

1..1, 1..2, 1..3,
2..2, 2..3,
3..3,
4..4,
5..5, 5..6, 5..7, 5..8,   5..9,  5..10,  5..11,
6..6, 6..7, 6..8,   6..9,  6..10,  6..11,
7..7, 7..8,   7..9,  7..10,  7..11,
8..8,   8..9,  8..10,  8..11,
9..9,  9..10,  9..11,
10..10, 10..11,
11..11,
10..12, 10..13,
11..12, 11..13,
12..12, 12..13, 
13..13,
15..15

例 5

n = 15
arr = [2..4, 4..5, 9..12, 14..14]

require 'time'
n = 1_000_000_000_000
m = 1_000 # number of NTBC ranges
s = 1_000 # size of each NTBC range
q = n/m
arr = m.times.map do |i|
from = rand(i*q..(i+1)*q-1-s)
from..from+s-1
end
arr.first(3)
#=> [737321450..737322449, 1803846784..1803847783, 2536962375..2536963374] 
arr.last(2)
#=> [998900666529..998900667528, 999036273747..999036274746] 
t = Time.now
x = count_non_covering_ranges(n, arr)
#=> 585_410_016_606_600_423_738 
puts "#{(Time.now-t).round(2)} seconds"
0.24 seconds

解释

请参阅枚举#each_cons。

首先要注意的是，count_non_covering_ranges的执行速度几乎与其第一个参数的大小n和 NTBC 范围的大小无关，并且大致与其第二个参数的大小成正比，arrNTBC 范围的数量。

假设：

n = 14
arr = [2..5, 7..9, 10..13]

我们首先将其修改为以下内容：

r1 = 0..0
r2 = 2..5
r3 = 7..9
r4 = 10..13
r5 = n+1..n+1 #=> 15..15
a = [r1, r2, r3, r4, r5]

我们首先计算：

c1 =  count_ranges(r1.begin+1, r2.end-1)
#=> count_ranges(1, 4)   #=> 10

这是1..4所涵盖的范围数量，其中没有一个涵盖任何NTBC的范围。这些10范围是1..1、1..2、1..3、1..4、2..2、2..3、2..4、3..3、3..4和4..4。

接下来我们计算：

c2 =  count_ranges(r2.begin+1, r3.end-1)
#=> count_ranges(3, 8)   #=> 21
c3 =  count_ranges(r3.begin+1, r4.end-1)
#=> count_ranges(8, 12)  #=> 15
c4 =  count_ranges(r4.begin+1, r5.end-1)
#=> count_ranges(11, 14) #=> 10

现在对这些值求和：

c1 + c2 + c3 + c4
#=> 56

让我们将其与我们对总数的计算结果进行比较：

count_non_covering_ranges(n, arr)
#=> 49

差异是由于我们对7范围进行了重复计算：

c1和c2都计算范围3..3、3..4和4..4;
c2和c3都计算范围8..8; 和
c3和c4都计算范围11..11、11..12和12..12。

因此，我们必须从49中减去：

count_ranges(r2.begin+1, r2.end-1) + count_ranges(r3.begin+1, r3.end-1) +
count_ranges(r4.begin+1, r4.end-1)
#=> 3 + 1 + 3 => 7

为方便起见，我还减去以下内容：

count_ranges(r1.begin+1, r1.end-1) + count_ranges(r5.begin+1, r5.end-1)
#=> count_ranges(1, -1) + count_ranges(16, 14) => 0 + 0 => 0

这是我构造count_ranges时允许的，以便count_ranges(i, j)在j < i时返回零。

对于大数组(n = 1000)，代码非常慢。

n = 1000 需要 20 多秒才能运行，我有一个 i7 8750 CPU

n == 10^5 .. 运行需要 10 分钟以上

对于大型数组，我有 9 到 10 秒的限制..

使用按位运算符的解决方案：n=10^4 in 83s

这是我尽最大努力，包括测试和基准测试。它不够快，但我认为这里有两个很好的技术可以使用，特别是：

生成具有位移的"子数组"(<<sequential_sub_arrays
通过位屏蔽执行"遗漏"(omit?&)

这些不是我经常使用或经常使用的技术，所以我把它们作为其他人玩的垫脚石。

为了达到你的目标("9到10秒......对于大型数组)，您可能需要采用这些技术并使用更快的语言(如 C 或 rust)实现它们。

# Convert omissions like [1,3] into bitmasks like 0b00000101. See use in `omit?`.
def omissions_to_masks(omissions)
omissions.map { |o|
len = o.max
s = o.each_with_object('0' * len) { |e, a| a[e - 1] = '1' }
eval('0b' + s.reverse)
}
end
# Test 1: omissions_to_masks
expected = [
0b00000101, # [1,3]
0b00010100  # [3,5]
]
actual = omissions_to_masks([[1,3], [3,5]])
fail unless actual == expected
# bit_subarray is an Integer, whose bits represent an array of Integers.
# For example, 0b00111001 represents the array [1,4,5,6]
def omit?(bit_subarray, masks)
masks.any? { |o| bit_subarray & o == o }
end
def sequential_sub_arrays(n, output, omissions)
masks = omissions_to_masks(omissions)
1.upto(n) do |size|
proto = 2 ** size - 1
0.upto(n - size) do |shift|
value = proto << shift
unless omit?(value, masks)
output << value
end
end
end
end
# Test 2: sequential_sub_arrays
expected = [
0b00001,
0b00010,
0b00100,
0b01000,
0b10000,
0b00011,
0b00110,
0b01100,
0b11000,
0b01110,
]
actual = []
sequential_sub_arrays(5, actual, [[1,3], [3,5]])
fail unless expected == actual
# Benchmarking. I use a `NullArray` to discard results, 
# because n=10^4 would produce 60 GB of results and I 
# don't have enough memory.
class NullArray
def <<(x)
# noop
end
end
require 'benchmark'
Benchmark.bm do |x|
[5, 100, 1_000, 5_000, 10_000].each do |max|
x.report { sequential_sub_arrays(max, NullArray.new, [[1,3], [3,5]]) }
end
end
#        user     system      total        real
#    0.000034   0.000002   0.000036 (  0.000032)
#    0.002266   0.000007   0.002273 (  0.002274)
#    0.444146   0.001798   0.445944 (  0.446747)
#   14.362543   0.375005  14.737548 ( 14.765235)
#   83.235438   8.440965  91.676403 ( 92.070415)

尝试优化此 ruby 代码以覆盖数组排序/匹配

使用按位运算符的解决方案：n=10^4 in 83s

相关内容

最新更新

热门标签：