几天来,我一直忙于我的机器学习项目。我有一个python脚本,它应该通过第二个脚本转换用于模型训练的数据。在第一个脚本中是一个我想转储到磁盘的数组列表,第二个脚本将其取消拾取
我试过几次使用pickle
,但每次脚本尝试pickle时,我都会得到内存错误:
Traceback (most recent call last):
File "Prepare_Input.py", line 354, in <module>
pickle.dump(Total_Velocity_Change, file)
MemoryError
有时,此脚本会被迫停止运行,并带有Killed
消息。
然而,我也尝试过使用hickle
,该脚本可以长时间运行,hickle
在过夜时会转储近10GB的巨大文件(du -sh myfile.hkl
(。我确信阵列大小最多不可能超过1.5GB。我还可以将阵列转储到控制台(print
(。使用hickle
,我不得不终止进程以停止脚本运行。
我也尝试了这里的所有答案,不幸的是,没有一个对我有效
有人知道我如何安全地将文件转储到磁盘以便以后加载吗?
使用dill我得到以下错误:
Traceback (most recent call last):
File "Prepare_Input.py", line 356, in <module>
dill.dump(Total_Velocity_Change, fp)
File "/home/akil/Desktop/tmd/venv/lib/python3.7/site-packages/dill/_dill.py", line 259, in dump
Pickler(file, protocol, **_kwds).dump(obj)
File "/home/akil/Desktop/tmd/venv/lib/python3.7/site-packages/dill/_dill.py", line 445, in dump
StockPickler.dump(self, obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
self._batch_appends(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
save(x)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
save(args)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 735, in save_bytes
self.memoize(obj)
File "/home/akil/anaconda3/lib/python3.7/pickle.py", line 461, in memoize
self.memo[id(obj)] = idx, obj
MemoryError
如果您想转储一个庞大的数组列表,您可能需要查看dask
或klepto
。dask
可以将列表分解成子阵列的列表,而klepto
可以将列表拆分成子阵列的dict(键指示子阵列的顺序(。
>>> import klepto as kl
>>> import numpy as np
>>> big = np.random.randn(10,100) # could be a huge array
>>> ar = kl.archives.dir_archive('foo', dict(enumerate(big)), cached=False)
>>> list(ar.keys())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
然后,每个文件一个条目被序列化到磁盘(在output.pkl中(
$ ls foo/K_0/
input.pkl output.pkl