Python对象序列化:pickle与hickle之间存在问题



几天来,我一直忙于我的机器学习项目。我有一个python脚本,它应该通过第二个脚本转换用于模型训练的数据。在第一个脚本中是一个我想转储到磁盘的数组列表,第二个脚本将其取消拾取

我试过几次使用pickle,但每次脚本尝试pickle时,我都会得到内存错误:

Traceback (most recent call last):
File "Prepare_Input.py", line 354, in <module>
pickle.dump(Total_Velocity_Change, file)
MemoryError

有时,此脚本会被迫停止运行,并带有Killed消息。

然而,我也尝试过使用hickle,该脚本可以长时间运行,hickle在过夜时会转储近10GB的巨大文件(du -sh myfile.hkl(。我确信阵列大小最多不可能超过1.5GB。我还可以将阵列转储到控制台(print(。使用hickle,我不得不终止进程以停止脚本运行。

我也尝试了这里的所有答案,不幸的是,没有一个对我有效

有人知道我如何安全地将文件转储到磁盘以便以后加载吗?

使用dill我得到以下错误:

Traceback (most recent call last):
File "Prepare_Input.py", line 356, in <module>
dill.dump(Total_Velocity_Change, fp)
File "/home/akil/Desktop/tmd/venv/lib/python3.7/site-packages/dill/_dill.py", line 259, in dump
Pickler(file, protocol, **_kwds).dump(obj)
File "/home/akil/Desktop/tmd/venv/lib/python3.7/site-packages/dill/_dill.py", line 445, in dump
StockPickler.dump(self, obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
save(args)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 735, in save_bytes
self.memoize(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 461, in memoize
self.memo[id(obj)] = idx, obj
MemoryError

如果您想转储一个庞大的数组列表,您可能需要查看daskkleptodask可以将列表分解成子阵列的列表,而klepto可以将列表拆分成子阵列的dict(键指示子阵列的顺序(。

>>> import klepto as kl
>>> import numpy as np
>>> big = np.random.randn(10,100)  # could be a huge array
>>> ar = kl.archives.dir_archive('foo', dict(enumerate(big)), cached=False)
>>> list(ar.keys())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> 

然后,每个文件一个条目被序列化到磁盘(在output.pkl中(

$ ls foo/K_0/
input.pkl   output.pkl

最新更新