有没有办法用TensorFlow卸载内存



我在一个类中有这个方法,该类准备数据并在同一个方法中对其进行训练,每次调用该方法时,我的内存使用量都会增长约200MB,这使得脚本无法长时间训练,在最好的情况下,它在内存耗尽前训练8-9次,我试着注释load_weights部分,但这不是问题的根源。我也试着在没有回调的情况下使用model.fit,但这似乎确实解决了问题。基本上,我试着在这个方法中注释每一行,但内存使用量不断增长,在另一个用while循环训练随机数的脚本中,它不会填充内存,所以我很确定这个方法有问题,它一直在向内存中添加数据而不清除它,我尝试过使用gc.collect(),但它根本没有帮助。

为什么会发生这种情况,如何解决?

def make_data(self):
if not os.path.exists("/py_stuff/BIN_API_v3/python-binance-master/"+str(self.coin)):
os.makedirs("/py_stuff/BIN_API_v3/python-binance-master/"+str(self.coin))

checkpoint_filepath ="/py_stuff/BIN_API_v3/python-binance-master/"+str(self.coin)+"/check_point"
weights_checkpoint = "/py_stuff/BIN_API_v3/python-binance-master/"+str(self.coin)

checkpoint_dir = os.path.dirname(checkpoint_filepath)

model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
mode='max',
save_best_only=True,
verbose=1)

dataset_train = self.df.tail(400)
training_set = dataset_train.iloc[:, 1:2].values
print (dataset_train.tail(5)) 
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(10, 400):
X_train.append(training_set_scaled[i-10:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
ST = time.time() 
model = Sequential()
model.add(LSTM(units = 128, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=128 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(units=1 ))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=[tf.keras.metrics.RootMeanSquaredError()])
## loading weights 
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")

model.fit(X_train,y_train,epochs=20,batch_size=50, callbacks=[model_checkpoint_callback])

### saving model conf and weights 
try:
#  model.save(checkpoint_filepath)
model.save_weights(filepath=checkpoint_filepath)
print ("Saving weights and model done ")
except OSError as no_model:
print ("Error saving weights and model !!!!!!!!!!!! ")
print (time.time() - ST)
self.model = model 
#   tf.keras.backend.clear_session()
return 

这里的问题是每次调用函数时都会重新创建模型。Tensorflow不会从内存中释放模型,直到会话重新启动(tf<2.0(或脚本本身重新运行(任何tf版本(。

您应该在函数之外创建模型(最好是在__init__方法中(,并在函数中使用它进行训练:

def __init__(self):
....
model = Sequential()
model.add(LSTM(units = 128, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=128 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(units=1 ))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=[tf.keras.metrics.RootMeanSquaredError()])
self.model = model
def make_data(self):
....
ST = time.time() 
model = self.model
## loading weights 
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")
....

最新更新