Keras 错误:调用predict_on_batch时"Optimization loop failed: Cancelled: Operation was cancelled"



我有一些使用keras的较旧的工作代码。我最近把它掸掉,并试图使用它,但与当前版本的keras/tensorflow。当调用predict_on_batch:

时,我得到一个警告/错误W tensorflow/核心/数据/root_dataset。[cc:167] Optimization loop failed: Cancelled: Operation was Cancelled

我试着用谷歌搜索这个问题,令我惊讶的是,网上似乎没有一个很好的解释是什么原因导致了这个问题,或者如何解决这个问题。以下是我的发现:

https://github.com/tensorflow/tensorflow/issues/48689

https://discuss.tensorflow.org/t/optimization-loop-failed-cancelled-operation-was-cancelled/1524

它列出的一个答案是确保批处理大小不大于整个集合。这里的情况并非如此。

代码有点长,所以我不能很容易地全部显示出来。这是一个深度强化学习应用程序,所以DL代码分为两个主要功能,我将在这里展示:

class DQN(QContract):
def __init__(self, states, actions, lr, DDQN=False):
self.history = []
act_relu = activations.relu
act_linear = activations.linear
top_layer = 150
middle_layer = 120
# Create Network: Default Parameters from https://towardsdatascience.com/solving-lunar-lander-openaigym-reinforcement-learning-785675066197
model = Sequential()
layer = layers.Dense(top_layer, input_dim=states, activation=act_relu)
model.add(layer)
layer = layers.Dense(middle_layer, activation=act_relu)
model.add(layer)
layer = layers.Dense(actions, activation=act_linear)
model.add(layer)
opt = optimizers.Adam(learning_rate=lr)
model.compile(loss='mse', optimizer=opt)
# Create DDQN-like networks
self.modelA = model
#self.modelB = copy.deepcopy(model)
self.batch_size = 100
self.current = "A"
self.count = 0

def Update(self, state, action, reward, new_state, gamma, alpha=None):
# Preform Replay
row_count = self.batch_size
if len(self.history) < row_count: return
# Column names
state = 0
action = 1
reward = 2
next_state = 3
done = 4
# Get samples in mini-batches
samples = random.sample(self.history, row_count)
# Separate into separate arrays
states_array = [sample[state] for sample in samples]
actions_array = [sample[action] for sample in samples]
rewards_array = [sample[reward] for sample in samples]
next_states_array = [sample[next_state] for sample in samples]
done_array = [sample[done] for sample in samples]
# Turn into arrays
states_array = np.array(states_array)
actions_array = np.array(actions_array)
rewards_array = np.array(rewards_array)
next_states_array = np.array(next_states_array)
done_array = (1.0 - np.array(done_array))
# train on states_array
X = states_array
# Create y (i.e. labels for supervised learning)
if self.current == "A":
model1 = self.modelA
model2 = self.modelA
else:
model1 = self.modelA
model2 = self.modelA
predicted_values = self.modelA.predict_on_batch(states_array)
next_predicted_values = self.modelA.predict_on_batch(next_states_array)
actual_values = rewards_array + gamma * np.amax(next_predicted_values, axis=1) * done_array
predicted_values[list(range(row_count)), actions_array] = actual_values
y = predicted_values
# Update network
self.current = "A"
if self.current == "A":
print('Do fit'+str(self.count))
self.count += 1
self.modelA.fit(X, y, epochs=1, verbose=0)
self.current = "B"
else:
self.modelA.fit(X, y, epochs=1, verbose=0)
self.current = "A"

有一次我试图做一个DQN,我现在没有,所以忽略了有两个模型的尝试。

这似乎是一个相当简单的问题,但我似乎无法解决它。我甚至试着逐步执行代码,我发现在逐步执行调试器时没有发生这种情况。

这个线程(由OP提到)现在有几个回复,建议添加以下行删除错误消息:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu, True)

我认为这里的问题是模型不可训练,因为如果模型的权重不能更新,优化循环就会失败。我也遇到了同样的问题我所要做的就是设置

model.trainable = True

最新更新