我想根据AUC保存最佳模型,我有此代码:
def MyMetric(yTrue, yPred):
auc = tf.metrics.auc(yTrue, yPred)
return auc
best_model = [ModelCheckpoint(filepath='best_model.h5', monitor='MyMetric', save_best_only=True)]
train_history = model.fit([train_x],
[train_y], batch_size=batch_size, epochs=epochs, validation_split=0.05,
callbacks=best_model, verbose = 2)
所以我的型号运行螺母,我会得到此警告:
RuntimeWarning: Can save best model only with MyMetric available, skipping.
'skipping.' % (self.monitor), RuntimeWarning)
,如果有人能告诉我这是正确的方法,如果没有,我该怎么办?
您必须通过要监视的度量
https://keras.io/metrics/#custom-metrics
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[MyMetric])
另外,tf.metrics.auc返回包含张量和update_op的元组。KERAS期望自定义度量功能仅返回张量。
def MyMetric(yTrue, yPred):
import tensorflow as tf
auc = tf.metrics.auc(yTrue, yPred)
return auc[0]
在此步骤之后,您将收到有关非初始化值的错误。请查看以下线程:
https://github.com/keras-team/keras/issues/3230
如何计算Keras中接收操作特征(ROC)和AUC?
您可以定义一个自定义度量标准,该指标调用TensorFlow以以下方式计算AUROC:
def as_keras_metric(method):
import functools
from keras import backend as K
import tensorflow as tf
@functools.wraps(method)
def wrapper(self, args, **kwargs):
""" Wrapper for turning tensorflow metrics into keras metrics """
value, update_op = method(self, args, **kwargs)
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([update_op]):
value = tf.identity(value)
return value
return wrapper
@as_keras_metric
def AUROC(y_true, y_pred, curve='ROC'):
return tf.metrics.auc(y_true, y_pred, curve=curve)
然后,您需要使用此指标编译模型:
model.compile(loss=train_loss, optimizer='adam', metrics=['accuracy',AUROC])
最后:检查点以下方式检查点:
model_checkpoint = keras.callbacks.ModelCheckpoint(path_to_save_model, monitor='val_AUROC',
verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', period=1)
虽然要小心:我相信验证AUROC是在明智的批处理和平均计算的;因此,检查点可能会出现一些错误。一个好主意可能是在模型训练完成后验证训练模型的预测(用Sklearn.metrics计算)的AUROC与训练和检查点时的TensorFlow报告
假设您使用张量板,那么您的所有度量计算的历史记录(以tfevents文件的形式)都适用于所有时期;那么tf.keras.callbacks.Callback
就是您想要的。
我将 tf.keras.callbacks.ModelCheckpoint
与 save_freq: 'epoch'
一起保存 - 作为h5文件或tf文件 - 每个时期的权重。
要避免使用模型文件填充硬盘驱动器,请编写新的Callback
- 或扩展ModelCheckpoint
类 - on_epoch_end
实现:
def on_epoch_end(self, epoch, logs=None):
super(DropWorseModels, self).on_epoch_end(epoch, logs)
if epoch < self._keep_best:
return
model_files = frozenset(
filter(lambda filename: path.splitext(filename)[1] == SAVE_FORMAT_WITH_SEP,
listdir(self._model_dir)))
if len(model_files) < self._keep_best:
return
tf_events_logs = tuple(islice(log_parser(tfevents=path.join(self._log_dir,
self._split),
tag=self.monitor),
0,
self._keep_best))
keep_models = frozenset(map(self._filename.format,
map(itemgetter(0), tf_events_logs)))
if len(keep_models) < self._keep_best:
return
it_consumes(map(lambda filename: remove(path.join(self._model_dir, filename)),
model_files - keep_models))
附录(导入和实用程序功能实现):
from itertools import islice
from operator import itemgetter
from os import path, listdir, remove
from collections import deque
import tensorflow as tf
from tensorflow.core.util import event_pb2
def log_parser(tfevents, tag):
values = []
for record in tf.data.TFRecordDataset(tfevents):
event = event_pb2.Event.FromString(tf.get_static_value(record))
if event.HasField('summary'):
value = event.summary.value.pop(0)
if value.tag == tag:
values.append(value.simple_value)
return tuple(sorted(enumerate(values), key=itemgetter(1), reverse=True))
it_consumes = lambda it, n=None: deque(it, maxlen=0) if n is None
else next(islice(it, n, n), None)
SAVE_FORMAT = 'h5'
SAVE_FORMAT_WITH_SEP = '{}{}'.format(path.extsep, SAVE_FORMAT)
为了完整性,其余的课程:
class DropWorseModels(tf.keras.callbacks.Callback):
"""
Designed around making `save_best_only` work for arbitrary metrics
and thresholds between metrics
"""
def __init__(self, model_dir, monitor, log_dir, keep_best=2, split='validation'):
"""
Args:
model_dir: directory to save weights. Files will have format
'{model_dir}/{epoch:04d}.h5'.
split: dataset split to analyse, e.g., one of 'train', 'test', 'validation'
monitor: quantity to monitor.
log_dir: the path of the directory where to save the log files to be
parsed by TensorBoard.
keep_best: number of models to keep, sorted by monitor value
"""
super(DropWorseModels, self).__init__()
self._model_dir = model_dir
self._split = split
self._filename = 'model-{:04d}' + SAVE_FORMAT_WITH_SEP
self._log_dir = log_dir
self._keep_best = keep_best
self.monitor = monitor
这具有能够在一个回调中保存和删除多个模型文件的额外优势。您可以轻松地使用不同的阈值支持,例如,以阈值或TP,FP,TN,FN的AUC保持所有模型文件。