我在Google colab中运行了好几次keras模型。由于tensorflow的性质,每次程序运行时都会创建一个新的模型,这会导致一些运行后内存耗尽。我发现keras的clear_session()
应该有助于解决这个问题,但它似乎不起作用。我在下面为谷歌colab创建了一个MWE。
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
X = np.zeros([10, 10000])
y = np.zeros([10, 10000])
########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()
m.fit(X,y)
K.clear_session()
在运行########
下面的部分三次后,我得到以下错误:
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-3-7ae5ab890fc2> in <module>
3 m.summary()
4
----> 5 m.fit(X,y)
6 K.clear_session()
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 ctx.ensure_initialized()
54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
ResourceExhaustedError: Graph execution error:
Detected at node 'RMSprop/RMSprop/update_2/mul_2' defined at (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
app.start()
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
self.io_loop.start()
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
self.run()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 381, in dispatch_queue
yield self.process_one()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 346, in wrapper
runner = Runner(result, future, yielded)
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1080, in __init__
self.run()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
user_expressions, allow_stdin,
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
return runner(coro)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-7ae5ab890fc2>", line 5, in <module>
m.fit(X,y)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
return self.apply_gradients(grads_and_vars, name=name)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 682, in apply_gradients
name=name)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 724, in _distributed_apply
var, apply_grad_to_update_var, args=(grad,), group=False)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/rmsprop.py", line 216, in _resource_apply_dense
var_t = var - coefficients["lr_t"] * grad / (
Node: 'RMSprop/RMSprop/update_2/mul_2'
failed to allocate memory
[[{{node RMSprop/RMSprop/update_2/mul_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_2465]
我想在同一个模型上处理稍微不同的数据,所以我多次运行类似的部分。错误发生后,我可以简单地重新启动笔记本,但加载数据需要一些时间,所以有没有一个选项可以让我真正清除旧型号?谢谢你的帮助。
请重新启动运行时并在我尝试复制上述代码时重试,它运行良好。
您可以检查下面提到的相同代码的输出:
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
X = np.zeros([10, 10000])
y = np.zeros([10, 10000])
########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()
m.fit(X,y, epochs=2)
K.clear_session()
输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10000) 100010000
dense_1 (Dense) (None, 10000) 100010000
dense_2 (Dense) (None, 10000) 100010000
dense_3 (Dense) (None, 10000) 100010000
=================================================================
Total params: 400,040,000
Trainable params: 400,040,000
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
1/1 [==============================] - 10s 10s/step - loss: 0.0000e+00
Epoch 2/2
1/1 [==============================] - 6s 6s/step - loss: 0.0000e+00