Python lru_cache false-negatives



我正在尝试仅通过其第一个参数缓存函数expand。出于缓存的目的,我不关心其他参数的值。

由于其他参数是字典,因此它们不可缓存,因此我定义了一个类来包含这些参数,其哈希始终返回 0,因此缓存函数应忽略它。

我在下面添加了一些削减代码。我使用的是 Python 版本 3.5.2。

class Node:
    def __init__(self, value):
        self.value = value
    def expand(self, a1, a2):
        return '{},{},{}'.format(self.value, a1, a2)

class ExpandArgs:
    def __init__(self, a1, a2):
        self.a1 = a1
        self.a2 = a2
    def __hash__(self):
        # We don't care about the hash, but it's required for caching
        return 0

@functools.lru_cache(maxsize=None)  # hash of args is always 0, so it should be ignored, and the hash of node should be used as the cache key
def expand(node, args):
    a1 = args.a1
    a2 = args.a2
    return node.expand(a1, a2)

e1 = ExpandArgs({}, {})
e2 = ExpandArgs({}, {})
print(hash(e1))  # 0
print(hash(e2))  # 0
node = Node(123)
print(expand.cache_info())  # CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
expand(node, e1)
print(expand.cache_info())  # CacheInfo(hits=0, misses=1, maxsize=None, currsize=1)
expand(node, e2)
print(expand.cache_info())  # CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
expand(node, e1)
print(expand.cache_info())  # CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
expand(node, e2)
print(expand.cache_info())  # CacheInfo(hits=2, misses=2, maxsize=None, currsize=2)

hash(e1) == hash(e2)以来,我希望对expand()的第二次调用达到e1的缓存值,但它没有命中。

为什么上面的代码没有收到 1 次缓存未命中和 3 次缓存命中?

事实证明,出于缓存的目的,使用 eq 而不是哈希来检查函数参数是否相等,因此当我更改类时它可以工作。

class ExpandArgs:
    def __init__(self, context, forecast_transaction_node_map, date_range_func):
        self.context = context
        self.forecast_transaction_node_map = forecast_transaction_node_map
        self.date_range_func = date_range_func
    def __hash__(self):
        # We don't care about the hash, but it's required for caching
        return 0
    def __eq__(self, other):
        return isinstance(other, self.__class__)

只是想添加一些注释,因为我花了一些时间阅读functools源代码,试图弄清楚为什么在这里使用 __eq__

事实证明,这是用作缓存的 Python 字典的基本功能(在 python 3.9 上测试(: 首先使用__hash__,但在哈希匹配时使用__eq__以确保对象实际上不同:

In [5]: class CompTuple(tuple):
   ...:     """Tuple that prints whenever equality operator is
   ...:  used"""
   ...:
   ...:     __hash__ = tuple.__hash__
   ...:
   ...:     def __eq__(self, other):
   ...:         print("equality comparison")
   ...:         return super().__eq__(other)
   ...:
In [6]: t1 = CompTuple( (1,2,3,) )
In [7]: t2 = CompTuple( (1,2,4,) )  # has different hash than t1
In [8]: t3 = CompTuple( (1,2,3,) )  # has same hash as t1
In [9]: d={}
In [10]: d[t1]=1
In [11]: d[t2]
--------------------------------------------------------------
KeyError                     Traceback (most recent call last)
<ipython-input-11-c1f54cc7c51f> in <module>
----> 1 d[t2]
KeyError: (1, 2, 4)
In [12]: d[t3]
equality comparison  # equality comparison because of same hash
Out[12]: 1
In [13]: d[t2]=2
In [14]: d[t3]       # still only one equality comparison
equality comparison
Out[14]: 1

python 文档明确要求比较相等的对象具有相同的哈希值。

但是,呼叫__eq__可能比呼叫__hash__昂贵得多。这可能会产生违反直觉的效果,即当缓存已包含相同哈希的对象时,会使lru_cache查找变得昂贵。

附言只是为了让事情更加混乱,字典查找中有一个快捷方式,可以在比较相同的对象(=具有相同id的对象(时跳过__eq__的调用:

In [15]: d[t1]  # no equality comparison, t1 *is* in the cache
Out[15]: 1

相关内容

  • 没有找到相关文章

最新更新