深度学习-Keras:针对多个大型数据集的批量训练



这个问题涉及在Keras中的多个大文件上进行训练的常见问题,这些文件联合起来太大,无法容纳GPU内存。我使用的是Keras 1.0.5,我想要一个不需要1.0.6的解决方案。fchollet描述了一种方法这里和此处:

# Create generator that yields (current features X, current labels y)
def BatchGenerator(files):
    for file in files:
        current_data = pickle.load(open("file", "rb"))
        X_train = current_data[:,:-1]
        y_train = current_data[:,-1]
        yield (X_train, y_train)
# train model on each dataset
for epoch in range(n_epochs):
    for (X_train, y_train) in BatchGenerator(files):
        model.fit(X_train, y_train, batch_size = 32, nb_epoch = 1)

然而,我担心模型的状态没有被保存,而是模型不仅在时期之间,而且在数据集之间被重新初始化。每个"Epoch 1/1"代表在以下不同数据集上的训练:

大纪元0

Epoch 1/1295806/295806【=========================】-13s-损失:15.7517
Epoch 1/1407890/407890【==========================】-19s-损耗:15.8036
Epoch 1/1383188/3383188【=========================】-19s-损耗:15.8130
大纪元1

Epoch 1/1295806/295806【=========================】-14秒-损失:15.7517
Epoch 1/1407890/407890【==========================】-20s-损耗:15.8036
Epoch 1/1383188/3383188【=========================】-15秒-损失:15.8130

我知道可以使用model.fit_generator,但由于上面的方法被反复建议作为批量训练的一种方式,我想知道我做错了什么。

谢谢你的帮助,

最大

我已经有一段时间没有遇到这个问题了,但我记得我使用过
Kera通过Python生成器(即model = Sequential(); model.fit_generator(...)(提供数据的功能。

示例代码片段(应该是不言自明的(

def generate_batches(files, batch_size):
   counter = 0
   while True:
     fname = files[counter]
     print(fname)
     counter = (counter + 1) % len(files)
     data_bundle = pickle.load(open(fname, "rb"))
     X_train = data_bundle[0].astype(np.float32)
     y_train = data_bundle[1].astype(np.float32)
     y_train = y_train.flatten()
     for cbatch in range(0, X_train.shape[0], batch_size):
         yield (X_train[cbatch:(cbatch + batch_size),:,:], y_train[cbatch:(cbatch + batch_size)])
model = Sequential()
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
train_files = [train_bundle_loc + "bundle_" + cb.__str__() for cb in range(nb_train_bundles)]
gen = generate_batches(files=train_files, batch_size=batch_size)
history = model.fit_generator(gen, samples_per_epoch=samples_per_epoch, nb_epoch=num_epoch,verbose=1, class_weight=class_weights)

相关内容

  • 没有找到相关文章

最新更新