OpenAI的基线使用以下代码返回LazyFrames
,而不是级联的numpy数组以节省内存。这个想法是利用这样一个事实,即numpy数组可以同时保存在不同的列表中,因为列表只保存引用而不是对象本身。然而,在LazyFrames
的实现中,它进一步将连接的numpy数组保存在self._out
中,在这种情况下,如果每个LazyFrames
对象至少被调用过一次,它将始终在其中保存一个连接的numdy数组,这似乎根本不节省任何内存。那么LazeFrames
有什么意义呢?或者我误解了什么?
class FrameStack(gym.Wrapper):
def __init__(self, env, k):
"""Stack k last frames.
Returns lazy array, which is much more memory efficient.
See Also
--------
baselines.common.atari_wrappers.LazyFrames
"""
gym.Wrapper.__init__(self, env)
self.k = k
self.frames = deque([], maxlen=k)
shp = env.observation_space.shape
self.observation_space = spaces.Box(low=0, high=255, shape=(shp[:-1] + (shp[-1] * k,)), dtype=env.observation_space.dtype)
def reset(self):
ob = self.env.reset()
for _ in range(self.k):
self.frames.append(ob)
return self._get_ob()
def step(self, action):
ob, reward, done, info = self.env.step(action)
self.frames.append(ob)
return self._get_ob(), reward, done, info
def _get_ob(self):
assert len(self.frames) == self.k
return LazyFrames(list(self.frames))
class LazyFrames(object):
def __init__(self, frames):
"""This object ensures that common frames between the observations are only stored once.
It exists purely to optimize memory usage which can be huge for DQN's 1M frames replay
buffers.
This object should only be converted to numpy array before being passed to the model.
You'd not believe how complex the previous solution was."""
self._frames = frames
self._out = None
def _force(self):
if self._out is None:
self._out = np.concatenate(self._frames, axis=-1)
self._frames = None
return self._out
def __array__(self, dtype=None):
out = self._force()
if dtype is not None:
out = out.astype(dtype)
return out
def __len__(self):
return len(self._force())
def __getitem__(self, i):
return self._force()[i]
def count(self):
frames = self._force()
return frames.shape[frames.ndim - 1]
def frame(self, i):
return self._force()[..., I]
我来这里是想了解这是如何节省内存的!但是您提到列表存储对底层数据的引用,而numpy数组存储该数据的副本,我认为您对此是正确的。
回答你的问题,你是对的!当调用_force
时,它会用numpy数组填充self._out
项,从而扩展内存。但是直到调用_force
(在LazyFrame
的任何API函数中调用(,self._out
就是None
。因此,在需要底层数据之前,不要调用_force
(因此,也不要调用任何LazyFrames
方法(,因此其文档字符串中的警告是"此对象在传递给模型之前只应转换为numpy数组"。
请注意,当数组填充self._out
时,它也会清除self._frames
,这样它就不会存储重复的信息(从而损害了它只存储所需信息的整个目的(。
此外,在同一个文件中,您可以找到携带以下文档字符串的ScaledFloatFrame
:
Scales the observations by 255 after converting to float. This will undo the memory optimization of LazyFrames, so don't use it with huge replay buffers.