我的DDQN网络是否正确实施

这是我的重放/训练功能实现。我制作了DDQN，使model在回放/训练期间落后model21个批次大小。通过设置self.ddqn = False，它变成一个正常的DQN。这是否正确实施？我使用这篇论文作为参考：

http://papers.nips.cc/paper/3964-double-q-learning.pdf

DDQN代码

def replay(self, batch_size):
if self.ddqn:
self.model2.load_state_dict(self.model.state_dict()) # copies model weights to model2
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
state = torch.Tensor(state)
next_state = torch.Tensor(next_state)
if self.cuda:
state = torch.Tensor(state).cuda()
next_state = torch.Tensor(next_state).cuda()
Q_current = self.model(state)
Q_target = Q_current.clone() # TODO: test copy.deepcopy() and Tensor.copy_()
Q_next = (1-done)*self.model(next_state).cpu().detach().numpy()
next_action = np.argmax(Q_next)
if self.ddqn:
Q_next = (1-done)*self.model2(next_state).cpu().detach().numpy()
Q_target[action] = Q_current[action] + self.alpha*(reward + self.gamma*Q_next[next_action] - Q_current[action])
self.optim.zero_grad()
loss = self.loss(Q_current, Q_target)
loss.backward()
self.optim.step()
if self.epsilon > self.epsilon_min:
self.epsilon = max(self.epsilon*self.epsilon_decay, self.epsilon_min)

我建议将next_action行移到下面，并使用if else:

if self.ddqn:
Q_next = (1-done)*self.model2(next_state).cpu().detach().numpy()
else:
Q_next = (1-done)*self.model(next_state).cpu().detach().numpy()
next_action = np.argmax(Q_next)

剩下的看起来还可以。

相关内容

最新更新

热门标签：