循环与列表理解与其他方法的速度/效率比较



使用下面给出的示例代码,我希望更好地了解python速度是如何根据给定函数的结构而变化的。

我定义的示例函数如下:给定两个字符串,它们返回不同的位数。我们假设assert len(s1) == len(s2)总是真的。

第一个函数使用列表理解。

def h_dist1(s1,s2):
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

第二个函数使用经典的for循环。

def h_dist2(s1,s2):
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

第二个代码的复杂度显然是CCD_ 2,其中CCD_。

示例相关问题:是否有更好的方法来定义此特定函数?h_dist1的复杂性是什么?

一般问题:一般来说,定义与上例中给出的函数(即需要在字符串/数组等上循环)相似的函数的最佳方式是什么(就可读性、速度、效率、更Python而言)?而且,最重要的是,为什么特定的方式是最快速/最高效的

注意我找过类似的问题,但没有找到任何具体的问题,例如HYRY在这里说要加快代码的速度,应该使用1。循环和2中的局部变量。使用列表理解但我仍然不明白为什么。当然,欢迎提及其他问答。

不要太快地写出不起眼的for循环。如果您实际上不需要列表,比如在本例中,标准的for循环可能比使用列表理解更快。当然,它的内存开销更少。

下面是一个执行定时测试的程序;它可以很容易地修改以添加更多的测试。

#!/usr/bin/env python
''' Time various implementations of string diff function
    From http://stackoverflow.com/q/28581218/4014959
    Written by PM 2Ring 2015.02.18
'''
from itertools import imap, izip, starmap
from operator import ne
from timeit import Timer
from random import random, seed
def h_dist0(s1,s2):
    ''' For loop '''
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot
def h_dist1(s1,s2):
    ''' List comprehension '''
    return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])
def h_dist2(s1,s2):
    ''' Generator expression '''
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
def h_dist3(s1,s2):
    ''' Generator expression with if '''
    return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
def h_dist3a(s1,s2):
    ''' Generator expression with izip '''
    return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
def h_dist4(s1,s2):
    ''' imap '''
    return sum(imap(ne, s1, s2))
def h_dist5(s1,s2):
    ''' starmap '''
    return sum(starmap(ne, izip(s1, s2)))
funcs = [
    h_dist0,
    h_dist1,
    h_dist2,
    h_dist3,
    h_dist3a,
    h_dist4,
    h_dist5,
]
# ------------------------------------
def check_full():
    print 'Testing all functions with strings of length', len(s1)
    for func in funcs:
        print '%s:%sn%dn' % (func.func_name, func.__doc__, func(s1, s2))
def check():
    print 'Testing all functions with strings of length', len(s1)
    print [func(s1, s2) for func in funcs], 'n'
def time_test(loops=10000, reps=3):
    ''' Print timing stats for all the functions '''
    slen = len(s1)
    print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)
    for func in funcs:
        #Get function name and docstring
        fname = func.func_name
        fdoc = func.__doc__
        print 'n%s:%s' % (fname, fdoc)
        t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
        results = t.repeat(reps, loops)
        results.sort()
        print results
    print 'n' + '- '*30 + 'n'
def make_strings(n, r=0.5):
    print 'r:', r
    s1 = 'a' * n
    s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
    return s1, s2
# ------------------------------------
seed(37)
s1, s2 = make_strings(100)
#print '%sn%sn' % (s1, s2)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)
s1, s2 = make_strings(10)
check()
time_test(50000)
s1, s2 = make_strings(1000)
check()
time_test(1000)

下面的结果来自于在Linux上运行Python 2.6.6的32位2GHz Pentium 4。

输出

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45] 
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop 
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]
h_dist1: List comprehension 
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]
h_dist2: Generator expression 
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]
h_dist3: Generator expression with if 
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]
h_dist3a: Generator expression with izip 
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]
h_dist4: imap 
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]
h_dist5: starmap 
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13] 
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop 
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]
h_dist1: List comprehension 
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]
h_dist2: Generator expression 
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]
h_dist3: Generator expression with if 
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]
h_dist3a: Generator expression with izip 
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]
h_dist4: imap 
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]
h_dist5: starmap 
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94] 
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop 
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]
h_dist1: List comprehension 
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]
h_dist2: Generator expression 
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]
h_dist3: Generator expression with if 
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]
h_dist3a: Generator expression with izip 
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]
h_dist4: imap 
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]
h_dist5: starmap 
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5] 
Length = 10, Loops = 50000, Repetitions = 3
h_dist0: For loop 
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]
h_dist1: List comprehension 
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]
h_dist2: Generator expression 
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]
h_dist3: Generator expression with if 
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]
h_dist3a: Generator expression with izip 
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]
h_dist4: imap 
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]
h_dist5: starmap 
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506] 
Length = 1000, Loops = 1000, Repetitions = 3
h_dist0: For loop 
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]
h_dist1: List comprehension 
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]
h_dist2: Generator expression 
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]
h_dist3: Generator expression with if 
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]
h_dist3a: Generator expression with izip 
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]
h_dist4: imap 
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]
h_dist5: starmap 
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

尽可能多地删除Python循环,不要在内存中创建不必要的列表,遵循这些内容可以获得非常有效的解决方案。例如,zip在内存中创建了一个列表,因此我们可以使用itertools.izip来获得迭代器。所以,根据我的快速测试,sum(starmap(ne, izip(s1, s2)))是最快的:

>>> from itertools import imap, izip, starmap
>>> from operator import ne
>>> s1 = 'a'*10**5
>>> s2 = 'b'*10**5
>>> %timeit sum(starmap(ne, izip(s1, s2)))
100 loops, best of 3: 4.25 ms per loop

很少有其他解决方案:

>>> %timeit sum(imap(ne, s1, s2))
100 loops, best of 3: 5.08 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
100 loops, best of 3: 11.3 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 10.7 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in izip(s1, s2))
100 loops, best of 3: 7.02 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 6.17 ms per loop

但差异并不大,所以我个人会将izip与生成器表达式一起使用,而不会滥用Python:中True==1和False==0的事实

sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

最新更新