加快循环或计算原始三元组的不同想法


def pythag_triples(n):
i = 0
start = time.time()
for x in range(1, int(sqrt(n) + sqrt(n)) + 1, 2):
for m in range(x+2,int(sqrt(n) + sqrt(n)) + 1, 2):
if gcd(x, m) == 1:
# q = x*m
# l = (m**2 - x**2)/2
c = (m**2 + x**2)/2
# trips.append((q,l,c))
if c < n:
i += 1
end = time.time()
return i, end-start
print(pythag_triples(3141592653589793))

我试图计算原始毕达哥拉斯三元组,使用所有三元组都是通过使用奇数和互质数的 m、n 生成的。我已经知道该功能最多可以工作到 1000000,但是当对更大的数字执行此操作时,它花费的时间超过 24 小时。关于如何加快速度/而不是蛮力的任何想法。我正在尝试计算三元组。

而不是对xm进行双循环并反复检查它们是否是互质的,我们只迭代m(两者中较大的一个),并应用 Euler 的 totient 函数或其自定义版本来直接计算相对素数的x值的数量m。这为我们提供了一种更快的方法(速度仍有待更精确地量化):例如,n = 100_000_000为 43 毫秒,而不是 OP 代码的 30 秒(700 倍加速)。

当允许x取的最大值xmax小于m时,需要自定义版本(以满足不等式(m**2 + x**2)/2 <= n)。在这种情况下,并非所有m互质数都应该被计算在内,而只计算那些达到该界限的共质数。

def distinct_factors(n):
# a variant of the well-known factorization, but that
# yields only distinct factors, rather than all of them
# (including possible repeats)
last = None
i = 2
while i * i <= n:
if n % i:
i += 1
else:
n //= i
if i != last:
yield i
last = i
if n > 1 and n != last:
yield n
def products_of(p_list, upto):
for i, p in enumerate(p_list):
if p > upto:
break
yield -p
for q in products_of(p_list[i+1:], upto=upto // p):
yield -p * q
def phi(n, upto=None):
# Euler's totient or "phi" function
if upto is not None and upto < n:
# custom version: all co-primes of n up to the `upto` bound
cnt = upto
p_list = list(distinct_factors(n))
for q in products_of(p_list, upto):
cnt += upto // q if q > 0 else -(upto // -q)
return cnt
# standard formulation: all co-primes of n up to n-1
cnt = n
for p in distinct_factors(n):
cnt *= (1 - 1/p)
return int(cnt)

phi(n)是欧拉全函数或φ(n)函数。

phi(n, upto=x)是一个自定义变体,它只计算直到给定值的共素数x。为了理解它,让我们用一个例子:

>>> n = 3*3*3*5  # 135
>>> list(factors(n))
[3, 3, 3, 5]
>>> list(distinct_factors(n))
[3, 5]
# there are 72 integers between 1 and 135 that are co-primes of 135
>>> phi(n)
72
# ...but only 53 of them are no greater than 100:
# 100 - (100//3 + 100//5 - 100//(3*5)) 
>>> phi(n, upto=100)
53

当评估值xn的互素数时,我们应该计算所有数1 .. x减去n任何不同因子的倍数。但是,当简单地删除所有p_ix // p_i时,我们会重复计算两个因子的倍数,因此我们需要"将它们加回来"。然而,当这样做时,我们会重复计算(添加太多次)三个因素的倍数,所以我们也需要考虑这些因素,等等。在示例n = 135中,我们删除了x // 3x // 5,但是这会重复计算那些既是 3 又是 5 的因子(因子 15)的整数,因此我们需要将它们相加。对于更长的因素集,我们需要:

  • x为初始计数;
  • 去每个因子的倍数p;
  • "减去"(加)2个因子的任何乘积的倍数;
  • "un-un-减去"(减去)3 个因子的任何乘积的倍数;
  • 等。

最初的答案是通过迭代不同因子的所有组合来做到这一点,但这在这个答案中得到了products_of(p_list, upto)生成器的实质性优化,它给出了给定p_list不同因子的所有子集的乘积,其乘积不大于upto。该符号指示如何解释每个产品:正数或负数分别取决于子集大小是偶数还是奇数。

有了phi(n)phi(n, upto),我们现在可以编写以下内容:

def pyth_m_counts(n):
# yield tuples (m, count(x) where 0 < x < m and odd(x)
# and odd(m) and coprime(x, m) and m**2 + x**2 <= 2*n).
mmax = isqrt(2*n - 1)
for m in range(3, mmax + 1, 2):
# requirement: (m**2 + x**2) // 2 <= n
# and both m and x are odd
# (so (m**2 + x**2) // 2 == (m**2 + x**2) / 2)
xmax = isqrt(2*n - m**2)
cnt_m = phi(2*m, upto=xmax) if xmax < m else phi(2*m) // 2
if cnt_m > 0:
yield m, cnt_m

为什么表达式phi(2*m) // 2?由于x(和m)都必须是奇数,根据OP,我们需要删除所有偶数值。我们可以在不修改phi()的情况下做到这一点,方法是传递2*m(然后有 2 作为因子,因此将"杀死"x的所有偶数值),然后除以 2 以获得m的实际非互素数数。 对phi(2*m, upto=xmax)进行了类似的(但更微妙的)考虑 - 我们将把它留给读者练习......

示例运行:

>>> n = 300
>>> list(pyth_m_counts(n))
[(3, 1),
(5, 2),
(7, 3),
(9, 3),
(11, 5),
(13, 6),
(15, 4),
(17, 8),
(19, 8),
(21, 3),
(23, 4)]

这意味着,在 OP 的函数中,pythag_triples(300)将返回 1 个带有m==3的元组,2 个带有m==5的元组,依此类推。实际上,让我们修改该函数以验证这一点:

def mod_pythag_triples(n):
for x in range(1, int(sqrt(n) + sqrt(n)) + 1, 2):
for m in range(x+2, int(sqrt(n) + sqrt(n)) + 1, 2):
if gcd(x, m) == 1:
c = (m**2 + x**2) // 2
if c < n:
yield x, m

然后:

>>> n = 300
>>> list(pyth_m_counts(n)) == list(Counter(m for x, m in mod_pythag_triples(n)).items())
True

对于n的任何正值都相同。

现在关于实际计数函数:我们只需要对每个 m 的计数求和:

def pyth_triples_count(n):
cnt = 0
mmax = isqrt(2*n - 1)
for m in range(3, mmax + 1, 2):
# requirement: (m**2 + x**2) // 2 <= n
# and both m and x are odd (so (m**2 + x**2) // 2 == (m**2 + x**2) / 2)
xmax = isqrt(2*n - m**2)
cnt += phi(2*m, upto=xmax) if xmax < m else phi(2*m) // 2
return cnt

示例运行:

>>> pyth_triples_count(1_000_000)
159139
>>> pyth_triples_count(100_000_000)
15915492
>>> pyth_triples_count(1_000_000_000)
159154994
>>> big_n = 3_141_592_653_589_793
>>> pyth_triples_count(big_n)
500000000002845

速度:

%timeit pyth_triples_count(100_000_000)
42.8 ms ± 56.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pyth_triples_count(1_000_000_000)
188 ms ± 571 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%time
pyth_triples_count(big_n)
CPU times: user 1h 42min 33s, sys: 480 ms, total: 1h 42min 33s
Wall time: 1h 42min 33s

注意:在同一台机器上,OP 问题中的代码需要 30 秒才能n=100_000_000;这个版本对于该n快 700 倍。

另请参阅我的另一个答案以获得更快的解决方案。

这个新答案使big_n的总时间减少到4分6秒

对我最初答案的剖析揭示了这些事实:

  • 总时间: 1小时42分33秒
  • 分解数字所花费的时间:几乎 100% 的时间

相比之下,生成从3sqrt(2*N - 1)的所有素数只需要38.5秒(使用阿特金筛)。

因此,我决定尝试一个版本,在该版本中,我们生成所有数字m素数的已知乘积。也就是说,生成器产生数字本身以及所涉及的不同素因数。无需因式分解。

结果仍然是500_000_000_002_841,正如@Koder注意到的那样,偏差了4。我还不知道这个问题从何而来编辑:在修正xmax界后(isqrt(2*N - m**2)而不是isqrt(2*N - m**2 - 1),因为我们确实想包括假设等于N的三角形),我们现在得到正确的结果。

素数生成器的代码包含在末尾。基本上,我使用了Atkin的筛子,适应(没有花太多时间)Python。我很确定它可以加快速度(例如使用numpy甚至numba)。

要从素数生成整数(我们知道我们可以做到这一点,这要归功于算术的基本定理),我们只需要遍历所有可能的乘积prod(p_i**k_i)其中p_ii^th素数,k_i是任何非负整数。

最简单的公式是递归公式:

def gen_ints_from_primes(p_list, upto):
if p_list and upto >= p_list[0]:
p, *p_list = p_list
pk = 1
p_tup = tuple()
while pk <= upto:
for q, p_distinct in gen_ints_from_primes(p_list, upto=upto // pk):
yield pk * q, p_tup + p_distinct
pk *= p
p_tup = (p, )
else:
yield 1, tuple()

不幸的是,我们很快就遇到了内存限制(和递归限制)。所以这是一个非递归版本,除了素数列表本身之外,它不使用额外的内存。本质上,q的当前值(正在生成的整数)和列表中的索引是我们生成下一个整数所需的所有信息。当然,这些值是未排序的,但这并不重要,只要它们都被覆盖。

def rem_p(q, p, p_distinct):
q0 = q
while q % p == 0:
q //= p
if q != q0:
if p_distinct[-1] != p:
raise ValueError(f'rem({q}, {p}, ...{p_distinct[-4:]}): p expected at end of p_distinct if q % p == 0')
p_distinct = p_distinct[:-1]
return q, p_distinct
def add_p(q, p, p_distinct):
if len(p_distinct) == 0 or p_distinct[-1] != p:
p_distinct += (p, )
q *= p
return q, p_distinct
def gen_prod_primes(p, upto=None):
if upto is None:
upto = p[-1]
if upto >= p[-1]:
p = p + [upto + 1]  # sentinel

q = 1
i = 0
p_distinct = tuple()

while True:
while q * p[i] <= upto:
i += 1
while q * p[i] > upto:
yield q, p_distinct
if i <= 0:
return
q, p_distinct = rem_p(q, p[i], p_distinct)
i -= 1
q, p_distinct = add_p(q, p[i], p_distinct)

例-

>>> p_list = list(primes(20))
>>> p_list
[2, 3, 5, 7, 11, 13, 17, 19]
>>> sorted(gen_prod_primes(p_list, 20))
[(1, ()),
(2, (2,)),
(3, (3,)),
(4, (2,)),
(5, (5,)),
(6, (2, 3)),
(7, (7,)),
(8, (2,)),
(9, (3,)),
(10, (2, 5)),
(11, (11,)),
(12, (2, 3)),
(13, (13,)),
(14, (2, 7)),
(15, (3, 5)),
(16, (2,)),
(17, (17,)),
(18, (2, 3)),
(19, (19,)),
(20, (2, 5))]

如您所见,我们不需要分解任何数字,因为它们方便地与所涉及的不同素数一起出现。

要只得到奇数,只需从素数列表中删除2

>>> sorted(gen_prod_primes(p_list[1:]), 20)
[(1, ()),
(3, (3,)),
(5, (5,)),
(7, (7,)),
(9, (3,)),
(11, (11,)),
(13, (13,)),
(15, (3, 5)),
(17, (17,)),
(19, (19,))]

为了利用这种数字和因子表示,我们需要稍微修改一下原始答案中给出的函数:

def phi(n, upto=None, p_list=None):
# Euler's totient or "phi" function
if upto is None or upto > n:
upto = n
if p_list is None:
p_list = list(distinct_factors(n))
if upto < n:
# custom version: all co-primes of n up to the `upto` bound
cnt = upto
for q in products_of(p_list, upto):
cnt += upto // q if q > 0 else -(upto // -q)
return cnt
# standard formulation: all co-primes of n up to n-1
cnt = n
for p in p_list:
cnt = cnt * (p - 1) // p
return cnt

有了这一切,我们现在可以重写我们的计数函数:

def pt_count_m(N):
# yield tuples (m, count(x) where 0 < x < m and odd(x)
# and odd(m) and coprime(x, m) and m**2 + x**2 <= 2*N))
# in this version, m is generated from primes, and the values
# are iterated through unordered.
mmax = isqrt(2*N - 1)
p_list = list(primes(mmax))[1:]  # skip 2
for m, p_distinct in gen_prod_primes(p_list, upto=mmax):
if m < 3:
continue
# requirement: (m**2 + x**2) // 2 <= N
# note, both m and x are odd (so (m**2 + x**2) // 2 == (m**2 + x**2) / 2)
xmax = isqrt(2*N - m*m)
cnt_m = phi(m+1, upto=xmax, p_list=(2,) + tuple(p_distinct))
if cnt_m > 0:
yield m, cnt_m
def pt_count(N, progress=False):
mmax = isqrt(2*N - 1)
it = pt_count_m(N)
if progress:
it = tqdm(it, total=(mmax - 3 + 1) // 2)
return sum(cnt_m for m, cnt_m in it)

现在:

%timeit pt_count(100_000_000)
31.1 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pt_count(1_000_000_000)
104 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# the speedup is still very moderate at that stage
# however:
%%time
big_n = 3_141_592_653_589_793
N = big_n
res = pt_count(N)
CPU times: user 4min 5s, sys: 662 ms, total: 4min 6s
Wall time: 4min 6s
>>> res
500000000002845

附录:阿特金的筛子

正如承诺的那样,这是我版本的阿特金筛子。它绝对可以加快速度。

def primes(limit):
# Generates prime numbers between 2 and n
# Atkin's sieve -- see http://en.wikipedia.org/wiki/Prime_number
sqrtLimit = isqrt(limit) + 1
# initialize the sieve
is_prime = [False, False, True, True, False] + [False for _ in range(5, limit + 1)]
# put in candidate primes:
# integers which have an odd number of
# representations by certain quadratic forms
for x in range(1, sqrtLimit):
x2 = x * x
for y in range(1, sqrtLimit):
y2 = y*y
n = 4 * x2 + y2
if n <= limit and (n % 12 == 1 or n % 12 == 5): is_prime[n] ^= True
n = 3 * x2 + y2
if n <= limit and (n % 12 == 7): is_prime[n] ^= True
n = 3*x2-y2
if n <= limit and x > y and n % 12 == 11: is_prime[n] ^= True
# eliminate composites by sieving
for n in range(5, sqrtLimit):
if is_prime[n]:
sqN = n**2
# n is prime, omit multiples of its square; this is sufficient because
# composites which managed to get on the list cannot be square-free
for i in range(1, int(limit/sqN) + 1):
k = i * sqN # k ∈ {n², 2n², 3n², ..., limit}
is_prime[k] = False
for i, truth in enumerate(is_prime):
if truth: yield i

多亏了皮埃尔,我找到了一个更快的解决方案。

这是我的新代码与皮埃尔的代码相结合,供任何想要它的人使用。

def sieve_factors(n):
s = [0] * (n+1)
s[1] = 1
for i in range(2, n+1, 2):
s[i] = 2
for i in range(3, n+1, 2):
if s[i] == 0:
s[i] = i
for j in range(i, n + 1, i):
if s[j] == 0:
s[j] = i
return s
Q = sieve_factors(2*(isqrt(2 * 3141592653589793) + 1))

def findfactors(n):
global Q
yield Q[n]
last = Q[n]
while n > 1:
if Q[n] != last and Q[n] != 1:
last = Q[n]
yield Q[n]
n //= Q[n]

def products_of(p_list, upto):
for i, p in enumerate(p_list):
if p > upto:
break
yield -p
for q in products_of(p_list[i+1:], upto=upto // p):
yield -p * q

def phi(n, upto=None):
if upto is not None and upto < n:
cnt = upto
p_list = list(findfactors(n))
for q in products_of(p_list, upto):
cnt += upto // q if q > 0 else -(upto // -q)
return cnt
cnt = n
for p in findfactors(n):
cnt *= (1 - 1/p)
return int(cnt)
def countprimtrips(n):
cnt = 0
for m in range(3, int(sqrt(2*n)) + 1, 2):
xmax = int(sqrt(2*n - m**2))
cnt += phi(2*m, upto=xmax) if xmax < m else phi(2*m) // 2
return cnt
print(countprimtrips(3141592653589793))

如上面的答案所述,大部分时间都花在因式分解上,所以我采用了他的代码,并添加了所有数字的筛子,直到 x-max,每个数字都是产生其最低素因数的索引。它会在 4 分 44 秒内找到答案。(284.6727148 秒)。谢谢皮埃尔的帮助。

最新更新