在 Python 中,列表中可被'n'整除的重复元素的最大索引



>我有一个排序的数字列表,如下所示:

a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]

我需要找到每个值的最大索引,该索引可被 100 整除。

输出应如下所示:4,10,15

我的代码:

a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
idx = 1
for i in (a):
if i%100 == 0:
print idx
idx = idx+1

以上代码的输出:

4
9
10
13
14
15

如果人们好奇,我将字典理解技术与向后迭代技术进行了基准测试。字典理解的速度大约是其两倍。更改为OrderedDict导致大幅减速。比字典理解慢大约 15 倍。

def test1():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
max_index = {}
for i, item in enumerate(a[::-1]):
if item not in max_index:
max_index[item] = len(a) - (i + 1)
return max_index
def test2():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
return {item: index for index, item in enumerate(a, 1)}
def test3():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
OrderedDict((item, index) for index, item in enumerate(a, 1))
if __name__ == "__main__":
import timeit
print(timeit.timeit("test1()", setup="from __main__ import test1"))
print(timeit.timeit("test2()", setup="from __main__ import test2"))
print(timeit.timeit("test3()", setup="from __main__ import test3; from collections import OrderedDict"))
3.40622282028
1.97545695305
26.347012043

使用简单的字典理解或以可分割项目作为键的OrderedDict,旧值将自动替换为最新值。

>>> {item: index for index, item in enumerate(lst, 1) if not item % 100}.values()
dict_values([4, 10, 15])
# if order matters
>>> from collections import OrderedDict    
>>> OrderedDict((item, index) for index, item in enumerate(lst, 1) if not item % 100).values()
odict_values([4, 10, 15])

另一种方法是循环访问反向列表并使用set来跟踪到目前为止看到的项目(lst[::-1]可能比小列表的reversed(lst)略快(。

>>> seen = set()
>>> [len(lst) - index for index, item in enumerate(reversed(lst))
if not item % 100 and item not in seen and not seen.add(item)][::-1]
[4, 10, 15]

您可以在此处看到上述等效代码的排序。

您可以使用itertools.groupby因为您的数据已排序:

>>> a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
>>> from itertools import groupby
>>> [list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]
[3, 9, 14]

虽然这有点神秘。

这是一种仅使用列表迭代器并累加到列表中的复杂方法:

>>> run, prev, idx = False, None, []
>>> for i, e in enumerate(a):
...     if not (e % 100 == 0):
...         if not run:
...             prev = e
...             continue
...         idx.append(i - 1)
...         run = False
...     else:
...         if prev != e and run:
...             idx.append(i - 1)
...         run = True
...     prev = e
...
>>> if run:
...     idx.append(i)
...
>>> idx
[3, 9, 14]

我认为这最好用字典方法来处理,例如 @AshwiniChaudhary 它更直接,更快:

>>> timeit.timeit("{item: index for index, item in enumerate(a, 1)}", "from __main__ import a")
1.842843743012054
>>> timeit.timeit("[list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]", "from __main__ import a, groupby")
8.479677081981208

groupby方法非常慢,请注意,复杂的方法更快,并且与字典理解方法相去甚远:

>>> def complicated(a):
...     run, prev, idx = False, None, []
...     for i, e in enumerate(a):
...         if not (e % 100 == 0):
...             if not run:
...                 prev = e
...                 continue
...             idx.append(i - 1)
...             run = False
...         else:
...             if prev != e and run:
...                 idx.append(i - 1)
...             run = True
...         prev = e
...     if run:
...         idx.append(i)
...     return idx
...
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.6667005629860796

编辑

请注意,如果我们在字典理解.values()上调用list,性能差异会缩小:
>>> timeit.timeit("list({item: index for index, item in enumerate(a, 1)}.values())", "from __main__ import a")
2.3839886570058297
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.708565960987471

一开始似乎是一个好主意,有点曲折,不得不修补几个案例......

a = [0,77,98,99,100,101,102,198,199,200,200,278,299,300,300,300, 459, 700,700]
bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
[a for a, b, c in zip(*bz, bz[1][1:]) if c-b != 0] + [bz[0][-1]]    
Out[78]: [4, 10, 15, 18]  

枚举,压缩以创建将 100 的分子与索引匹配的bz

bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
print(*bz, sep='n')
(4, 9, 10, 13, 14, 15, 17, 18)
(1, 2, 2, 3, 3, 3, 7, 7)

然后再次压缩,zip(*bz, bz[1][1:])滞后分子元组,以允许滞后差值为每次运行的最后一个索引提供选择逻辑if c-b != 0,但最后一个索引

添加最后 100 场比赛,因为它始终是最后一次运行的结束+ [bz[0][-1]]

最新更新