如何在模型推理中选择合适的决策阈值

我正在进行一个说话人识别项目，我已经训练了模型，它的精度是90%，但我在进行推理时遇到了问题，模型给出了两个概率，因为它是为两个对话者训练的，但我想当我检测到一个不在训练集中的说话人时，告诉我他是一个"未知说话者"；，因此，如何根据模型给出的两个概率来选择决策阈值？

这是有问题的代码片段：

sample_df = pd.DataFrame(list_).transpose()
sample_pred =  model.predict(sample_df)[0] # Here the model returns the name of the 
# predicted speaker
sample_prob = model.predict_proba(sample_df)[0] # Here I get a list of two items, the 
# probabilities for each speaker 
print(sample_prob) # Output example: [0.46 0.54]
for k,j in enumerate(sample_prob):
if j <= 0.6 and sample_prob[k+1] <= 0.6: # How to change dynamically according to the 
# result of sample_prob, this threshold ?, 
# for example I put 0.6.
sample_pred= "unknown speaker"
break
else:
break

我认为您有一个二进制分类任务->它是一个"；未知说话者">或否。而且，如果我理解正确的话，你想优化阈值。换句话说，您不希望使用0.5。由于这是一项分类任务，我会在验证集(而不是测试集)上选择最大化f1-score的阈值，因为这意味着你有数据泄露)：

thresholds = np.arange(0, 1, 0.01)
scores = [f1_score(y_test, predictions>t) for t in thresholds]
idx = np.argmax(scores)

thresholds[idx]给出最佳分数

我看到你输出了两个概率，但你真的只需要一个。从第一个你可以很容易地推断出第二个。

相关内容

最新更新

热门标签：