"IndexError: index 20 is out of bounds for axis 1 with size 20"什么

我在迷宫环境中进行q学习，然而，在最初阶段，它工作得很好，但后来，我得到了以下内容Max_future_q = np.max(q_table[new_discrete_state])IndexError:索引20超出了轴1大小为20的边界

我不明白这里有什么问题下面是代码:

enter code here
import gym
import numpy as np
import gym_maze
env = gym.make("maze-v0")
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000
DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high - env.observation_space.low)/DISCRETE_OS_SIZE
# Exploration settings
epsilon = 1  # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING - START_EPSILON_DECAYING)

q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))

def get_discrete_state(state):
discrete_state = (state - env.observation_space.low)/discrete_os_win_size
return tuple(discrete_state.astype(np.int))  # we use this tuple to look up the 3 Q values for the available actions in the q-table

for episode in range(EPISODES):
discrete_state = get_discrete_state(env.reset())
done = False
if episode % SHOW_EVERY == 0:
render = True
print(episode)
else:
render = False
while not done:
if np.random.random() > epsilon:
# Get action from Q table
action = np.argmax(q_table[discrete_state])
else:
# Get random action
action = np.random.randint(0, env.action_space.n)

new_state, reward, done, _ = env.step(action)
new_discrete_state = get_discrete_state(new_state)
if episode % SHOW_EVERY == 0:
env.render()
#new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# If simulation did not end yet after last step - update Q table
if not done:
# Maximum possible Q value in next step (for new state)
max_future_q = np.max(q_table[new_discrete_state])
# Current Q value (for current state and performed action)
current_q = q_table[discrete_state + (action,)]
# And here's our equation for a new Q value for current state and action
new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# Update Q table with new Q value
q_table[discrete_state + (action,)] = new_q

# Simulation ended (for any reson) - if goal position is achived - update Q value with reward directly
elif new_state[0] >= env.goal_position:
#q_table[discrete_state + (action,)] = reward
q_table[discrete_state + (action,)] = 0
discrete_state = new_discrete_state
# Decaying is being done every episode if episode number is within decaying range
if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
epsilon -= epsilon_decay_value

env.close()

该错误意味着您试图索引形状为(n,20)axis 1 with size 20和20的数组。如np.zeros((10、20))(:20)尝试验证np数组和索引

的大小。

索引越界错误表示您正在尝试访问位于容器中不存在的索引处的项。您不能在一行五个人中选择第六个人。

与大多数编程语言一样，Python是0索引的。这意味着容器中的第一个项的索引为0，而不是1。因此，大小为5的容器中项目的索引将是

0, 1, 2, 3, 4

可以看到，容器中最后一项的索引比容器的大小小1。在python中，您可以使用

获取容器中最后一项的索引。

len(foo) - 1

通过打印

env.reset ()

得到一个像这样的元组((array([-0.4530919, 0. ], dtype=float32), {})因此，我们需要取元组的第0个索引得到状态数组=array([-0.4530919, 0. ]此步骤必须在

行输入"for"-循环之前执行。

discrete_state = get_discrete_state(env.reset())

这一行必须修改为:

discrete_state = get_discrete_state(env.reset()[0])

，然后减去函数"中discrete_state项的其余部分将是正确的;get_discrete_state";当输入"for"循环时，此错误将永远不会再出现。

相关内容

最新更新

热门标签：