我正在尝试仅通过其第一个参数缓存函数expand
。出于缓存的目的,我不关心其他参数的值。
由于其他参数是字典,因此它们不可缓存,因此我定义了一个类来包含这些参数,其哈希始终返回 0,因此缓存函数应忽略它。
我在下面添加了一些削减代码。我使用的是 Python 版本 3.5.2。
class Node:
def __init__(self, value):
self.value = value
def expand(self, a1, a2):
return '{},{},{}'.format(self.value, a1, a2)
class ExpandArgs:
def __init__(self, a1, a2):
self.a1 = a1
self.a2 = a2
def __hash__(self):
# We don't care about the hash, but it's required for caching
return 0
@functools.lru_cache(maxsize=None) # hash of args is always 0, so it should be ignored, and the hash of node should be used as the cache key
def expand(node, args):
a1 = args.a1
a2 = args.a2
return node.expand(a1, a2)
e1 = ExpandArgs({}, {})
e2 = ExpandArgs({}, {})
print(hash(e1)) # 0
print(hash(e2)) # 0
node = Node(123)
print(expand.cache_info()) # CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
expand(node, e1)
print(expand.cache_info()) # CacheInfo(hits=0, misses=1, maxsize=None, currsize=1)
expand(node, e2)
print(expand.cache_info()) # CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
expand(node, e1)
print(expand.cache_info()) # CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
expand(node, e2)
print(expand.cache_info()) # CacheInfo(hits=2, misses=2, maxsize=None, currsize=2)
自hash(e1) == hash(e2)
以来,我希望对expand()
的第二次调用达到e1
的缓存值,但它没有命中。
为什么上面的代码没有收到 1 次缓存未命中和 3 次缓存命中?
事实证明,出于缓存的目的,使用 eq 而不是哈希来检查函数参数是否相等,因此当我更改类时它可以工作。
class ExpandArgs:
def __init__(self, context, forecast_transaction_node_map, date_range_func):
self.context = context
self.forecast_transaction_node_map = forecast_transaction_node_map
self.date_range_func = date_range_func
def __hash__(self):
# We don't care about the hash, but it's required for caching
return 0
def __eq__(self, other):
return isinstance(other, self.__class__)
只是想添加一些注释,因为我花了一些时间阅读functools
源代码,试图弄清楚为什么在这里使用 __eq__
。
事实证明,这是用作缓存的 Python 字典的基本功能(在 python 3.9 上测试(: 首先使用__hash__
,但在哈希匹配时使用__eq__
以确保对象实际上不同:
In [5]: class CompTuple(tuple):
...: """Tuple that prints whenever equality operator is
...: used"""
...:
...: __hash__ = tuple.__hash__
...:
...: def __eq__(self, other):
...: print("equality comparison")
...: return super().__eq__(other)
...:
In [6]: t1 = CompTuple( (1,2,3,) )
In [7]: t2 = CompTuple( (1,2,4,) ) # has different hash than t1
In [8]: t3 = CompTuple( (1,2,3,) ) # has same hash as t1
In [9]: d={}
In [10]: d[t1]=1
In [11]: d[t2]
--------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-11-c1f54cc7c51f> in <module>
----> 1 d[t2]
KeyError: (1, 2, 4)
In [12]: d[t3]
equality comparison # equality comparison because of same hash
Out[12]: 1
In [13]: d[t2]=2
In [14]: d[t3] # still only one equality comparison
equality comparison
Out[14]: 1
python 文档明确要求比较相等的对象具有相同的哈希值。
但是,呼叫__eq__
可能比呼叫__hash__
昂贵得多。这可能会产生违反直觉的效果,即当缓存已包含相同哈希的对象时,会使lru_cache
查找变得昂贵。
附言只是为了让事情更加混乱,字典查找中有一个快捷方式,可以在比较相同的对象(=具有相同id
的对象(时跳过对__eq__
的调用:
In [15]: d[t1] # no equality comparison, t1 *is* in the cache
Out[15]: 1