每个k折的大小如何定义



我目前正在使用交叉验证训练我的回归网络,我没有任何标签,但是应该映射到特定输出的特定输入,然后该网络应生成映射。似乎在定义折叠的方式上存在一些问题。

我进行交叉验证的方式就是这样:

############################### Training setup ##################################
#Define 10 folds:
seed = 7
np.random.seed(seed)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
print "Splits"
cvscores_loss = []
for train, test in kfold.split(train_set_data_vstacked_normalized,train_set_output_vstacked):
    print "Model definition!"
    model = Sequential()
    #act = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))
    #act1 = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))
    #act2 = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400, input_dim=400, init="normal",activation=K.tanh))
    act4=ELU(10000)
    model.add(Dense(output_dim=13, input_dim=300, init="normal",activation=act4))
    print "Compiling"
    model.compile(loss='mean_squared_error', optimizer='RMSprop',  metrics=["accuracy"])
    print "Compile done! "
    print 'n'
    print "Train start"
    model.fit(train_set_data_vstacked_normalized[train],train_set_output_vstacked[train], nb_epoch=10, verbose=1)
    loss, accuracy = model.evaluate(x=train_set_data_vstacked_normalized[test],y=train_set_output_vstacked[test],verbose=1)
    print
    print('loss: ', loss)
    print('accuracy: ', accuracy)
    print()
    print model.summary()
    print "New Model:"
    cvscores_loss.append(loss)

print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores_loss), numpy.std(cvscores_loss)))

此代码的问题是我从不输入for循环。在打印"拆分"后,收到警告消息...

Splits
/home/k/.local/lib/python2.7/site-packages/sklearn/model_selection/_split.py:579: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=10.

这使得质疑kfold,知道我的神经网络的输入和输出维度是多少?...

我应该在某个地方定义它吗?或如何?..

消息告诉您问题。您的目标课之一只有1个成员。当它分成10倍时,每个班级至少需要10个成员,以便在每个折叠中放1个。

您需要检查目标类的计数以找到有问题的类并删除它。

我认为您对此感到复杂。如果您需要在Keras模型上进行跨验证,则可以使用Keras Scikit-Learn API。为此,您需要:

导入一些东西:

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

创建一个定义模型的函数:

def model_creation():
    model = Sequential()
    model.add(...)
    ...
    model.compile(...)
    return model

并使用包装器:

model = KerasClassifier(build_fn=model_creation, nb_epoch=100, batch_size=100, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
results = cross_val_score(model, X, y, cv=kfold)

相关内容

  • 没有找到相关文章

最新更新