使用lru_cache和__hash__缓存对象实例 - Cache object instances with lru_cache and __hash_

我不明白functools.lru_cache如何与对象实例一起工作。我假设这个类必须提供一个__hash__方法。因此，任何具有相同哈希值的实例都应该hit缓存。

下面是我的测试:

from functools import lru_cache
class Query:    
def __init__(self, id: str):
self.id = id
def __hash__(self):
return hash(self.id)
@lru_cache()
def fetch_item(item):
return 'data'
o1 = Query(33)
o2 = Query(33)
o3 = 33
assert hash(o1) == hash(o2) == hash(o3)
fetch_item(o1)  # <-- expecting miss
fetch_item(o1)  # <-- expecting hit
fetch_item(o2)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit
info = fetch_item.cache_info()
print(info)
assert info.hits == 4
assert info.misses == 1
assert info.currsize == 1

如何缓存具有相同哈希值的对象实例的调用?

简短回答:当o1已经在缓存中时，为了在o2上获得缓存命中，类可以定义一个__eq__()方法，比较Query对象是否具有相等的值。

例如:

def __eq__(self, other):
return isinstance(other, Query) and self.id == other.id

:还有一个细节值得在总结中提及，而不是隐藏在细节中:这里描述的行为也适用于Python 3.9中引入的functools.cache包装器，因为@cache()只是@lru_cache(maxsize=None)的快捷方式。

长答(含o3):

这里有一个关于字典查找的确切机制的很好的解释，所以我不会全部重新创建。可以这么说，由于LRU缓存是以字典的形式存储的，所以类对象需要被视为已经存在于缓存中，因为比较字典键的方式是相同的。

您可以在一个普通字典的快速示例中看到这一点，该类的两个版本，一个使用__eq__()，另一个不使用:

>>> o1 = Query_with_eq(33)
>>> o2 = Query_with_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_with_eq object at 0x6fffffea9430>: 2}

在字典中只产生一项，因为键是相等的，而

>>> o1 = Query_without_eq(33)
>>> o2 = Query_without_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_without_eq object at 0x6fffffea9cd0>: 1, <__main__.Query_without_eq object at 0x6fffffea9c70>: 2}

产生两个项(不相等的键)。

为什么当Query对象存在时int不会导致缓存命中:

o3是一个普通的int对象。虽然它的值与Query(33)比较是相等的，但假设Query.__eq__()正确地比较了类型，lru_cache有一个优化绕过了这种比较。

通常，lru_cache为包装函数的参数创建一个字典键(作为tuple)。如果缓存是用typed=True参数创建的，那么它还存储每个参数的类型，因此只有当值也具有相同类型时，它们才相等。

优化是，如果包装函数只有一个参数，并且类型为int或str，则直接使用单个参数作为字典键，而不是转换为元组。因此，即使(Query(33),)和33有效地存储相同的值，它们在比较时也不会被认为是相等的。(请注意，我并不是说int对象不缓存，只是说它们不匹配非int类型的现有值。从您的示例中，您可以看到fetch_item(o3)在第二次调用时获得缓存命中)。

你如果参数类型与int不同，则获取缓存命中。例如，33.0将匹配，同样假设Query.__eq__()考虑了类型并返回True。你可以这样做:

def __eq__(self, other):
if isinstance(other, Query):
return self.id == other.id
else:
return self.id == other

尽管lru_cache()期望它的参数是可哈希的，但它不使用它们的实际哈希值，因此你会得到那些错过。

函数_make_key使使用_HashedSeq来确保它拥有的所有项都是可哈希的，但后来在_lru_cache_wrapper中它不使用哈希值。

(如果只有一个参数且为int或str类型，则跳过_HashedSeq)

class _HashedSeq(list):
""" This class guarantees that hash() will be called no more than once
per element.  This is important because the lru_cache() will hash
the key multiple times on a cache miss.
"""
__slots__ = 'hashvalue'
def __init__(self, tup, hash=hash):
self[:] = tup
self.hashvalue = hash(tup)
def __hash__(self):
return self.hashvalue

fetch_item(o1)  # Stores (o1,) in cache dictionary, but misses and stores (o1,)
fetch_item(o1)  # Finds (o1,) in cache dictionary
fetch_item(o2)  # Looks for (o2,) in cache dictionary, but misses and stores (o2,)
fetch_item(o3)  # Looks for (o3,) in cache dictionary, but misses and stores (33,)

不幸的是，没有提供自定义make_key函数的文档方法，因此，实现这一目标的一种方法是通过猴子修补_make_key函数(在上下文管理器中):

import functools
from contextlib import contextmanager

def make_key(*args, **kwargs):
return hash(args[0][0])

def fetch_item(item):
return 'data'
@contextmanager
def lru_cached_fetch_item():
try:
_make_key_og = functools._make_key
functools._make_key = make_key
yield functools.lru_cache()(fetch_item)
finally:
functools._make_key = _make_key_og

class Query:    
def __init__(self, id: int):
self.id = id
def __hash__(self):
return hash(self.id)

o1 = Query(33)
o2 = Query(33)
o3 = 33
assert hash(o1) == hash(o2) == hash(o3)
with lru_cached_fetch_item() as func:
func(o1)  # <-- expecting miss
func(o1)  # <-- expecting hit
func(o2)  # <-- expecting hit BUT get a miss !
func(o3)  # <-- expecting hit BUT get a miss !
func(o3)  # <-- expecting hit
info = func.cache_info()
print(info) # CacheInfo(hits=4, misses=1, maxsize=128, currsize=1)
assert info.hits == 4
assert info.misses == 1
assert info.currsize == 1

使用lru_cache和hash缓存对象实例

相关内容

最新更新

热门标签：