我正在尝试优化我的代码,以便在hadoop集群上运行。有人能帮我找到一些方法让它变得更好吗?我正在接受一组非常大的数字,每个数字都在一条新的线上。当数字读入时,我计算每个数字,求和所有数字,并检查每个数字是否为素数。
#!/usr/bin/env python
import sys
import string
import math
total_of_primes = 0
total = 0
count = 0
not_prime = 0
count_string = 'Count:'
total_string = 'Total:'
prime_string = 'Number of Primes:'
for line in sys.stdin:
try:
key = int(line)
except:
continue
total = total + key
count = count + 1
if key == 2 or key == 3:
not_prime = not_prime - 1
elif key%2 == 0 or key%3 == 0:
not_prime = not_prime + 1
else:
for i in range(5,(int(math.sqrt(key))+1),6):
if key%i == 0 or key%(i+2) ==0:
not_prime = not_prime + 1
break
total_of_primes = count - not_prime
print '%st%s' % (count_string,count)
print '%st%s' % (total_string,total)
print '%st%s' % (prime_string,total_of_primes)
我试着接受一切,并将其转化为理解。理解比原生Python代码更快,因为它们访问C库。我还省略了对2
和3
的测试,因为一旦完成循环,就可以手动添加它们。
我几乎可以保证这会有错误,因为我没有你的测试数据,而且这么大的理解(无论如何,对我来说)真的需要测试。从技术上讲,这是一个单行,但为了可读性,我试图将其拆分。不过,希望它至少能给你一些想法。
biglist = [ # this will be a list of booleans
not int(line)%2 or # the number is not even
not int(line)%3 or # not divisible by 3
(
not int(line)%i or # not divisible by each item in the range() object
not int(line)%(i+2) for i in # nor by 2 greater than each item
# and only go through the range() object while it's still prime
itertools.takewhile(lambda x: not int(line)%x or not int(line)%(x+2),
range(5, int(pow(int(line), 0.5))+1, 6)) # pow(x, 0.5) uses a built-in instead of an imported module
)
for line in sys.stdin) if line.lstrip('-+').isdigit() # going through each item in sys.stdin
# as long as long as it's a digit. if you only expect positive numbers, you can omit ".lstrip('-+')".
]
total_of_primes = len(biglist) + 2 # manually add 2 and 3 instead of testing it
如果您不能将执行时间缩短到足够的程度,您可以考虑转移到较低级别(写得较慢,运行得较快)的语言,如C.