pickle更新包含对象的类定义,但dill没有



dill更新了dilled/undilled对象本身的类定义,但不更新dilled/unilled对象包含的对象的类定义。

pickle在任何一种情况下都会更新类定义。

为什么dill不遵循与pickle相同的行为?

泡菜

import os
import pickle
import tempfile
from dataclasses import dataclass, field

def pickle_save(x):
with tempfile.NamedTemporaryFile(delete=False) as f:
pickle.dump(x, f)
return f

def pickle_load(f):
with open(f.name, "rb") as f:
x = pickle.load(f)
os.unlink(f.name)
return x

@dataclass
class B:
attribute: str = "old"
def method_1(self):
print(f"old class: {self.attribute=}")

@dataclass
class A:
attribute_1: str = "old"
instances_of_B: list[B] = field(default_factory=list)
def method_1(self):
print(f"old class: {self.attribute_1=}, {self.instances_of_B=}")
def add_b_instance(self):
self.instances_of_B.append(B())

old_a = A()
old_a.add_b_instance()
old_a.method_1()
old_a.instances_of_B[0].method_1()
print(f"{old_a = }")
temp_file = pickle_save(old_a)
# old_a has been saved to file
# Next we update our class definitions
# then load old_a from file,
# and see whether the added methods exist
@dataclass
class A:
attribute_1: str = "new"
attribute_2: str = "new attribute 2"
instances_of_B: list[B] = field(default_factory=list)
def method_1(self):
print(f"new class: {self.attribute_1=}, {self.instances_of_B=}")
def method_2(self):
print("this method from A did not exist before")
print(f"this attribute did not exist before: {self.attribute_2=}")

@dataclass
class B:
attribute: str = "new"
def method_1(self):
print(f"new class: {self.attribute=}")
def method_2(self):
print("this method from B did not exist before")

new_a = pickle_load(temp_file)
print(f"{new_a=}")
new_a.method_1()
new_a.method_2()
new_a.instances_of_B[0].method_1()
new_a.instances_of_B[0].method_2()

加载后可以使用已腌制的A实例和已包含的B实例的新方法_2:

old class: self.attribute_1='old', self.instances_of_B=[B(attribute='old')]
old class: self.attribute='old'
old_a = A(attribute_1='old', instances_of_B=[B(attribute='old')])
new_a=A(attribute_1='old', attribute_2='new attribute 2', instances_of_B=[B(attribute='old')])
new class: self.attribute_1='old', self.instances_of_B=[B(attribute='old')]
this method from A did not exist before
this attribute did not exist before: self.attribute_2='new attribute 2'
new class: self.attribute='old'
this method from B did not exist before

dill

import dill as pickle

加载后只能使用腌制的A实例的新方法_2,而包含的B实例的新方式_2不能:

old class: self.attribute_1='old', self.instances_of_B=[B(attribute='old')]
old class: self.attribute='old'
old_a = A(attribute_1='old', instances_of_B=[B(attribute='old')])
new_a=A(attribute_1='old', attribute_2='new attribute 2', instances_of_B=[B(attribute='old')])       
new class: self.attribute_1='old', self.instances_of_B=[B(attribute='old')]
this method from A did not exist before
this attribute did not exist before: self.attribute_2='new attribute 2'
old class: self.attribute='old'
Traceback (most recent call last):
File "c:question_dill_pickle.py", line 78, in <module>
new_a.instances_of_B[0].method_2()
AttributeError: 'B' object has no attribute 'method_2'

我是dill的作者。dill在这里不遵循pickle的行为,因为pickle通过引用序列化类(即,它别无选择,只能使用当前上下文中使用的任何类定义(,而dill将类定义与pickle实例一起存储。。。这样你就可以选择行为。默认情况是使用存储类,这样您就可以获得所需的内容(更常见的情况是,这是所需的(。但是,如果要忽略存储的类并使用更新的定义,则可以在load中使用ignore=True关键字(或在dill.settings中全局更改它(。

来自文档:

If *ignore=False* then objects whose class is defined in the module
*__main__* are updated to reference the existing class in *__main__*,
otherwise they are left to refer to the reconstructed type, which may
be different.

最新更新