将状态批处理传递到网络时大小不匹配

因为我是ML的初学者，这个问题或整体设计可能听起来很愚蠢，对此感到抱歉。我愿意接受任何建议。

我有一个简单的网络，有三个线性层，其中一个是输出层。

self.fc1 = nn.Linear(in_features=2, out_features=12)
self.fc2 = nn.Linear(in_features=12, out_features=16)
self.out = nn.Linear(in_features=16, out_features=4)

我的状态由两个值组成，坐标 x 和 y。这就是输入层具有两个功能的原因。

main.py 中，我在ReplayMemory类中采样和提取记忆，并将它们传递给get_current函数：

experiences = memory.sample(batch_size)
states, actions, rewards, next_states = qvalues.extract_tensors(experiences)
current_q_values = qvalues.QValues.get_current(policy_net, states, actions)

由于单个状态由两个值组成，因此状态张量的长度为 batchsize x 2，而动作的长度为 batchsize。(也许这就是问题所在？

当我在函数中将"状态"传递给我的网络以获取状态的预测 q 值get_current时，出现此错误：

尺寸不匹配，m1： [1x16]， m2： [2x12]

看起来它试图抓住状态张量，就好像它是一个单一的状态张量一样。我不想这样。在我遵循的教程中，它们传递状态张量，这是多个状态的堆栈，没有问题。我做错了什么？:)

这是我存储体验的方式：

memory.push(dqn.Experience(state, action, next_state, reward))

这是我的提取张量函数：

def extract_tensors(experiences):
# Convert batch of Experiences to Experience of batches
batch = dqn.Experience(*zip(*experiences))
state_batch = torch.cat(tuple(d[0] for d in experiences))
action_batch = torch.cat(tuple(d[1] for d in experiences))
reward_batch = torch.cat(tuple(d[2] for d in experiences))
nextState_batch = torch.cat(tuple(d[3] for d in experiences))
print(action_batch)
return (state_batch,action_batch,reward_batch,nextState_batch)

我遵循的教程是这个项目的教程。

https://github.com/nevenp/dqn_flappy_bird/blob/master/dqn.py

在第 148 行和第 169 行之间查看。尤其是第 169 行，它将状态批处理传递到网络。

解决了。事实证明，我不知道如何正确创建 2d 张量。 2D 张量必须如下所示：

states = torch.tensor([[1， 1]， [2,2]]， dtype=torch.float(

相关内容

最新更新

热门标签：