刚刚得到一个奇怪的结果,我正在努力理解。我有一个大约325k行(列表)的数据集,每个行大约有90个项目(字符串、浮点等——这其实并不重要)。比方说,如果我想对所有项目进行一些处理,那么我可以使用2"for"s:对它们进行迭代
for eachRow in rows:
for eachItem in eachRow:
# do something
在我的系统中,这段代码执行了41秒。但如果我用一系列索引acess(eachRow[0],eachRowm[1],到目前为止一直到eachRow[89])替换嵌套循环,执行时间将降至25秒。
for eachRow in rows:
eachRow[0] # do something with this item
eachRow[1] # do something with this item
..
eachRow[89] # do something with this item
当然,编写这样的代码不是一个好主意——我只是在寻找一种提高数据处理性能的方法,却意外地发现了这种奇怪的方法。有什么意见吗?
展开似乎确实有一点性能优势,但可以忽略不计,因此除非do_something
函数真的什么都不做,否则不应该看到差异。我很难相信不同方法的等效行为可能会达到60%,尽管我总是愿意对一些我从未想过的实施细节感到惊讶。
tl;dr摘要,使用32500而不是325000,因为我不耐烦:
do_nothing easy 3.44702410698
do_nothing indexed 3.99766016006
do_nothing mapped 4.36127090454
do_nothing unrolled 3.33416581154
do_something easy 5.4152610302
do_something indexed 5.95649385452
do_something mapped 6.20316290855
do_something unrolled 5.2877831459
do_more easy 16.6573209763
do_more indexed 16.8381450176
do_more mapped 17.6184959412
do_more unrolled 16.0713188648
CPython 2.7.3,代码:
from timeit import Timer
nrows = 32500
ncols = 90
a = [[1.0*i for i in range(ncols)] for j in range(nrows)]
def do_nothing(x):
pass
def do_something(x):
z = x+3
return z
def do_more(x):
z = x**3+x**0.5+4
return z
def easy(rows, action):
for eachRow in rows:
for eachItem in eachRow:
action(eachItem)
def mapped(rows, action):
for eachRow in rows:
map(action, eachRow)
def indexed(rows, action):
for eachRow in rows:
for i in xrange(len(eachRow)):
action(eachRow[i])
def unrolled(rows, action):
for eachRow in rows:
action(eachRow[0])
action(eachRow[1])
action(eachRow[2])
action(eachRow[3])
action(eachRow[4])
action(eachRow[5])
action(eachRow[6])
action(eachRow[7])
action(eachRow[8])
action(eachRow[9])
action(eachRow[10])
action(eachRow[11])
action(eachRow[12])
action(eachRow[13])
action(eachRow[14])
action(eachRow[15])
action(eachRow[16])
action(eachRow[17])
action(eachRow[18])
action(eachRow[19])
action(eachRow[20])
action(eachRow[21])
action(eachRow[22])
action(eachRow[23])
action(eachRow[24])
action(eachRow[25])
action(eachRow[26])
action(eachRow[27])
action(eachRow[28])
action(eachRow[29])
action(eachRow[30])
action(eachRow[31])
action(eachRow[32])
action(eachRow[33])
action(eachRow[34])
action(eachRow[35])
action(eachRow[36])
action(eachRow[37])
action(eachRow[38])
action(eachRow[39])
action(eachRow[40])
action(eachRow[41])
action(eachRow[42])
action(eachRow[43])
action(eachRow[44])
action(eachRow[45])
action(eachRow[46])
action(eachRow[47])
action(eachRow[48])
action(eachRow[49])
action(eachRow[50])
action(eachRow[51])
action(eachRow[52])
action(eachRow[53])
action(eachRow[54])
action(eachRow[55])
action(eachRow[56])
action(eachRow[57])
action(eachRow[58])
action(eachRow[59])
action(eachRow[60])
action(eachRow[61])
action(eachRow[62])
action(eachRow[63])
action(eachRow[64])
action(eachRow[65])
action(eachRow[66])
action(eachRow[67])
action(eachRow[68])
action(eachRow[69])
action(eachRow[70])
action(eachRow[71])
action(eachRow[72])
action(eachRow[73])
action(eachRow[74])
action(eachRow[75])
action(eachRow[76])
action(eachRow[77])
action(eachRow[78])
action(eachRow[79])
action(eachRow[80])
action(eachRow[81])
action(eachRow[82])
action(eachRow[83])
action(eachRow[84])
action(eachRow[85])
action(eachRow[86])
action(eachRow[87])
action(eachRow[88])
action(eachRow[89])
def timestuff():
for action in 'do_nothing do_something do_more'.split():
for name in 'easy indexed mapped unrolled'.split():
t = Timer(setup="""
from __main__ import {} as fn
from __main__ import {} as action
from __main__ import a
""".format(name, action),
stmt="fn(a, action)").timeit(10)
print action, name, t
if __name__ == '__main__':
timestuff()
(请注意,我并没有费心使比较完全公平,因为我只是试图衡量变化的可能规模,即订单单位的变化与否。)
与其他对此进行计时的响应者不同,我看到了时间上的很大差异。首先,我的代码:
import random
import string
import timeit
r = 1000
outer1 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer2 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer3 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
def x1(L):
for outer in L:
for inner in L:
inner = inner[:-1]
def x2(L):
for outer in L:
for y in range(len(outer)):
outer[y] = outer[y][:-1]
def x3(L):
for x in range(len(L)):
for y in range(len(L[x])):
L[x][y] = L[x][y][:-1]
print "x1 =",timeit.Timer('x1(outer1)', "from __main__ import x1,outer1").timeit(10)
print "x2 =",timeit.Timer('x2(outer2)', "from __main__ import x2,outer2").timeit(10)
print "x3 =",timeit.Timer('x3(outer3)', "from __main__ import x3,outer3").timeit(10)
注意,这10次我都跑了。每个列表由3000个项目填充,每个项目包含90个项目,每个项目都是10个字母的随机字符串。
代表性结果:
x1 = 8.0179214353
x2 = 0.118051644801
x3 = 0.150409681521
不使用索引(x1)的函数执行时间是仅对内部循环使用索引(x2)的函数的66倍。奇怪的是,只对内环(x2)使用索引的函数比对外环和内环(x3)都使用索引的功能执行得更好。