我一直在尝试用DQN求解pong atari。我在乒乓球环境中使用OpenAI健身房。
我已经制作了一个自定义的ObservationWrapper,但我无法弄清楚我覆盖的reset((方法有什么问题。
错误:
Traceback (most recent call last):
File "C:UsersbernaDocumentsPytorch ExperimentTorching the Dead GrassDeepQLearningtraining.py", line 123, in <module>
agent = Agent(env, buffer)
File "C:UsersbernaDocumentsPytorch ExperimentTorching the Dead GrassDeepQLearningtraining.py", line 56, in __init__
self._reset()
File "C:UsersbernaDocumentsPytorch ExperimentTorching the Dead GrassDeepQLearningtraining.py", line 59, in _reset
self.state = env.reset()
File "C:UsersbernaAppDataLocalProgramsPythonPython310libsite-packagesgymcore.py", line 379, in reset
obs, info = self.env.reset(**kwargs)
File "C:UsersbernaDocumentsPytorch ExperimentTorching the Dead GrassDeepQLearningwrappers.py", line 106, in reset
return self.observation(self.env.reset())
File "C:UsersbernaAppDataLocalProgramsPythonPython310libsite-packagesgymcore.py", line 379, in reset
obs, info = self.env.reset(**kwargs)
File "C:UsersbernaAppDataLocalProgramsPythonPython310libsite-packagesgymcore.py", line 379, in reset
obs, info = self.env.reset(**kwargs)
ValueError: too many values to unpack (expected 2)
Process finished with exit code 1
代码:
代理人:
class Agent:
def __init__(self, env, exp_buffer):
self.env = env
self.exp_buffer = exp_buffer
self._reset()
def _reset(self):
self.state = env.reset()
self.total_reward = 0.0
包装:
class BufferWrapper(gym.ObservationWrapper):
def __init__(self, env, n_steps, dtype=np.float32):
super(BufferWrapper, self).__init__(env)
self.dtype = dtype
old_space = env.observation_space
self.observation_space = gym.spaces.Box(old_space.low.repeat(n_steps, axis=0),
old_space.high.repeat(n_steps, axis=0), dtype=dtype)
def reset(self):
self.buffer = np.zeros_like(self.observation_space.low, dtype=self.dtype)
return self.observation(self.env.reset())
def observation(self, observation):
self.buffer[:-1] = self.buffer[1:]
self.buffer[-1] = observation
return self.buffer
有人能帮我理解为什么我会收到那个错误吗?
您必须对代码进行两次更改。
-
在重置方法中,您必须返回,不仅要像您所做的那样返回观察还必须返回return_info参数。https://gymnasium.farama.org/api/env/#gymnasium.Env.reset
-
此外,在重置方法中,您应该接受种子和选项。通过将**kwargs作为一个参数,您将被覆盖。
您的代码应该是:
class BufferWrapper(gym.ObservationWrapper):
def __init__(self, env, n_steps, dtype=np.float32):
super(BufferWrapper, self).__init__(env)
self.dtype = dtype
old_space = env.observation_space
self.observation_space = gym.spaces.Box(old_space.low.repeat(n_steps, axis=0),
old_space.high.repeat(n_steps, axis=0), dtype=dtype)
def reset(self, **kwargs):
self.buffer = np.zeros_like(self.observation_space.low, dtype=self.dtype)
obs, info = self.env.reset(**kwargs)
return self.observation(obs), info
def observation(self, observation):
self.buffer[:-1] = self.buffer[1:]
self.buffer[-1] = observation
return self.buffer
此外,我希望您注意,如果您有一个包装器作用于步骤方法,您还必须将其更新为返回参数终止和截断。