下面的代码是从我正在编写的一些代码中提炼出来的一个经过测量的热点。我正试图弄清楚如何在Python 3.9.0中加速这个循环。我测量相同的循环为>在VC++2019中使用CCD_ 1的速度提高了30倍。
正如你所看到的,我尝试了几种不同的方法。map()
函数似乎返回了一个迭代器,所以我将其转换为一个列表来衡量执行的全部成本。
我觉得这是一种相当自然的方式来表示我的数据。我当然可以在这里进行一些代表性或算法改进。然而,我有点惊讶,在这种情况下迭代如此缓慢,我想看看它是否可以改进,首先。
执行的性能结果:python listIteration.py
Iteration by index
66.66 ms
60.90 ms
62.74 ms
Total: 124998250000
Iteration by index -- just integers
55.22 ms
55.27 ms
80.84 ms
Total: 124998250000
Iteration by object
56.48 ms
60.30 ms
55.77 ms
Total: 124998250000
List comprehension
235.34 ms
328.15 ms
272.47 ms
Total: 124998250000
Map
310.81 ms
353.87 ms
300.27 ms
Total: 124998250000
代码:
import time
def makeList():
data = []
for i in range(500000):
data.append([i, i, i])
return data
def makeListOfInts():
data = []
for i in range(500000):
data.append(i)
return data
def dumpTime(delta):
print("{:.2f}".format(1000.0*delta) + " ms")
NUM_TRIALS = 3
print("Iteration by index");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for j in range(len(data)):
data[j][0] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("Iteration by index -- just integers");
data = makeListOfInts()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for j in range(len(data)):
data[j] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data)
print("Total: "+ str(total))
print("Iteration by object");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for v in data:
v[0] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("List comprehension");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data = [[x[0]-1, x[1], x[2]] for x in data]
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("Map");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
# here we convert the map object to a list, because apparently
# map() returns an iterator, and we want to measure the full cost
# of the computation
data = list(map(lambda x: [x[0]-1, x[1], x[2]], data))
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
Python代码将比C++慢。没有办法绕过它,除非您将迭代消除/外包给C后端,这就是numpy
所做的。
例如,你可以做
import numpy as np
def makeArray():
data = np.vstack((np.arange(500000), np.arange(500000), np.arange(500000))).T
return data
def makeArrayOfInts():
data = np.arange(500000)
return data
然后,您根本不需要迭代。
data = makeArray()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data[:, 0] = data[:, 0] - 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data[:, 0])
print("Total: "+ str(total))
data = makeArrayOfInts()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data = data - 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data)
print("Total: "+ str(total))
这两个都是超快速:每个试验需要~1ms,而迭代列表需要~50ms。