我有一个计数器列表:
from collections import Counter
counters = [
Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
]
我可以使用如下所示的循环来组合它们,但我希望避免循环。
all_ct = Counter()
for ct in counters:
all_ct.update(ct)
使用reduce
会出现错误:
all_ct = Counter()
reduce(all_ct.update, counters)
>>> TypeError: update() takes from 1 to 2 positional arguments but 3 were given
有没有一种方法可以在不使用循环的情况下将计数器组合成一个计数器?
U可以使用sum函数。
all_ct = sum(counters, Counter())
注意,计数器实现__add__
来合并计数器。。。所以你可以使用:
In [3]: from collections import Counter
...: counters = [
...: Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
...: Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
...: Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
...: ]
In [4]: from operator import add
In [5]: from functools import reduce
In [6]: reduce(add, counters)
Out[6]:
Counter({'coach': 1,
'says': 1,
'play': 2,
'basketball': 3,
'i': 2,
'said': 2,
'hate': 1,
'he': 1})
或者更简单地说:
In [7]: final = Counter()
In [8]: for c in counters:
...: final += c
...:
In [9]: final
Out[9]:
Counter({'coach': 1,
'says': 1,
'play': 2,
'basketball': 3,
'i': 2,
'said': 2,
'hate': 1,
'he': 1})
注意,上面的方法更有效,因为它只使用一个dict。如果你使用reduce(add, counters)
,它会在每次迭代上创建一个新的中间计数器对象
为了说明我的意思,在最好的情况下,密钥总是重复的,你必须使用reduce
/sum
方法来完成双倍的工作:
In [1]: from collections import Counter
...: counters = [
...: Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
...: Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
...: Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
...: ]
In [2]: counters *= 5_000
In [3]: from functools import reduce
In [4]: from operator import add
In [5]: %%timeit
...: data = counters.copy()
...: result = Counter()
...: for c in data:
...: result += c
...:
21.2 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %%timeit
...: data = counters.copy()
...: reduce(add, counters)
...:
...:
50.9 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
我相信在最坏的情况下(每个计数器的键与其他计数器的键不相交(,这将退化为二次性能。
最后,请注意,可以使用reduce
(而不是sum
(进行就地添加,从而消除性能问题:
In [6]: import operator
In [7]: operator.iadd?
Signature: operator.iadd(a, b, /)
Docstring: Same as a += b.
Type: builtin_function_or_method
In [8]: reduce(operator.iadd, counters, Counter())
Out[8]:
Counter({'coach': 5000,
'says': 5000,
'play': 10000,
'basketball': 15000,
'i': 10000,
'said': 10000,
'hate': 5000,
'he': 5000})
注意,现在的性能与显式循环不相上下:
In [9]: %%timeit
...: data = counters.copy()
...: reduce(operator.iadd, counters, Counter())
...:
...:
22 ms ± 224 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
然而,将reduce
等功能结构与有副作用的功能混合只是。。。丑陋的对于不纯净的函数,最好坚持使用命令式代码。
您需要将update((替换为reduce可以使用的表单:
def static_update(x, y):
x.update(y)
return x
all_ct = Counter()
functools.reduce(static_update, counters)