我正在尝试看看mlflow是否是在模型跟踪中存储度量的合适位置。根据文档,log_metric要么取一个键值,要么取一组键值。我想知道如何将下面这样的东西记录到mlflow中,这样它就可以有意义地可视化。
precision recall f1-score support
class1 0.89 0.98 0.93 174
class2 0.96 0.90 0.93 30
class3 0.96 0.90 0.93 30
class4 1.00 1.00 1.00 7
class5 0.93 1.00 0.96 13
class6 1.00 0.73 0.85 15
class7 0.95 0.97 0.96 39
class8 0.80 0.67 0.73 6
class9 0.97 0.86 0.91 37
class10 0.95 0.81 0.88 26
class11 0.50 1.00 0.67 5
class12 0.93 0.89 0.91 28
class13 0.73 0.84 0.78 19
class14 1.00 1.00 1.00 6
class15 0.45 0.83 0.59 6
class16 0.97 0.98 0.97 245
class17 0.93 0.86 0.89 206
accuracy 0.92 892
宏观平均值0.88 0.90 0.88 892加权平均值0.93 0.92 0.92 892
几天前我搜索了同样的东西,由于我仍然没有找到任何更可行的东西,并且这篇文章再次出现在我的搜索结果的顶部,我想我分享了一个在评论中已经提到的方法@Martin Zivdar的例子,我现在已经实现了
旁注
- 为了简单起见,我略去了预处理、再平衡等。。等等
- 可以在平面字典中同时记录多个度量(或参数((请参阅文档(
TL;DR
记录所有性能指标可以通过循环来完成,这里是classification_report()
的一个例子
# Logging all metrics in classification_report
mlflow.log_metric("accuracy", cr.pop("accuracy"))
for class_or_avg, metrics_dict in cr.items():
for metric, value in metrics_dict.items():
mlflow.log_metric(class_or_avg + '_' + metric,value)
创建样本数据/模拟训练
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from imblearn.over_sampling import SMOTE
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report
import mlflow
# Create example data
N = 5000
n_features = 20
X, y = make_classification(n_samples=N,
n_features=n_features,
n_clusters_per_class=1,
weights=[0.8,0.15,0.05],
flip_y=0,
random_state=1, n_classes=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
# Start logging
mlflow.set_experiment("stackoverflow")
with mlflow.start_run():
# Simulate Model Training
grid_params = {
"criterion" : ["gini","log_loss"],
"min_samples_split": np.arange(2,6),
"min_samples_leaf": np.linspace(0.01,0.5, num = 3),
"ccp_alpha": np.linspace(0,3,5),
}
cv=StratifiedKFold(shuffle=True)
grid_search = GridSearchCV(DecisionTreeClassifier(),grid_params,n_jobs=3, return_train_score=False, scoring='f1_macro', verbose=1)
grid_search.fit(X_train,y_train)
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
# it is possible to log multiple params (and metrics) in a flat dictionary
mlflow.log_params(best_params)
y_pred = best_model.predict(X_test)
cr = classification_report(y_test, y_pred, output_dict=True)
cr
输出:
{'0': {'precision': 0.9461312438785504,
'recall': 0.966,
'f1-score': 0.9559623948540327,
'support': 1000},
'1': {'precision': 0.8083832335329342,
'recall': 0.7180851063829787,
'f1-score': 0.7605633802816901,
'support': 188},
'2': {'precision': 0.7903225806451613,
'recall': 0.7903225806451613,
'f1-score': 0.7903225806451614,
'support': 62},
'accuracy': 0.92,
'macro avg': {'precision': 0.8482790193522153,
'recall': 0.8248025623427133,
'f1-score': 0.835616118593628,
'support': 1250},
'weighted avg': {'precision': 0.9176858334261937,
'recall': 0.92,
'f1-score': 0.9183586482775923,
'support': 1250}}
使用MLFlow记录多个度量的示例
到目前为止还不错,现在要记录分类报告的所有度量,只需在嵌套字典上迭代即可。我首先手动.pop
准确性,因为这是dict 中唯一的非嵌套条目
# Logging all metrics in classification_report
mlflow.log_metric("accuracy", cr.pop("accuracy"))
for class_or_avg, metrics_dict in cr.items():
for metric, value in metrics_dict.items():
mlflow.log_metric(class_or_avg + '_' + metric,value)