字典中各值的平均值



我有一个名为model_scores_for_datasets的字典,它看起来像这样:

{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}
{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}

我想要得到字典列表中每个字典的平均值。总共有4个"未处理"标准化正常化Rescale"每个参数总共有8个指标,如下所示:

{'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}

所以4个量表中的每一个都有8个不同的ML值我想要得到一个平均值,比如平均"标准化"得分最高,所以它将在机器学习过程中使用。

这是代码,但它给我一个错误:TypeError: can't convert type 'str' to numerator/denominator


avgDict = model_scores_for_datasets
for st,vals in avgDict.items():
print(st,(vals))
#print (st)
for st,vals in avgDict.items():
print("Average for {} is {}".format(st,mean(vals)))
import numpy as np
for mode in results.keys():
mean = np.mean([float(value) for value in results[mode].values()])
print(f"{mode}: {mean}")

:

Unprocessed: 0.9624999999999999
Standardisation: 0.9542499999999999
Normalisation: 0.9584999999999999
Rescale: 0.9542499999999999

为PythonCrazy

print({mode: np.mean([float(value) for value in results[mode].values()]) for mode in results.keys()})

首先你必须转换成正确的类型:

avgDict = model_scores_for_datasets
#conversion
avgDict=dict(zip(avgDict.keys(),list(map(float,avgDict.keys())))
for st,vals in avgDict.items():
print(st,(vals))
#print (st)
for st,vals in avgDict.items():
print("Average for {} is {}".format(st,mean(vals)))

输出:

Average for Logistic Regression is 0.967
Average for Support Vector Machine is 0.967
Average for Decision Tree is 0.933
Average for Random Forest is 0.933
Average for LinearDiscriminant is 1.0
Average for K-Nearest Neighbour is 1.0
Average for Naive Bayes is 0.967
Average for XGBoost is 0.933

一个简单的解决方案是:

data = {'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}
dicts = list(data.keys())
keys = list(data['Unprocessed'].keys())
r = {}
for k in keys:
r[k] = sum([float(data[d][k]) for d in dicts])/len(dicts)

print(r)
#{'Logistic Regression': 0.9585, 'Support Vector Machine': 0.967, 'Decision Tree': 0.933, 'Random Forest': 0.95, 'LinearDiscriminant': 0.9752500000000001, 'K-Nearest Neighbour': 0.9752500000000001, 'Naive Bayes': 0.967, 'XGBoost': 0.933}

同样,如果你想按字典取平均值:

r2 = {}
for d in dicts:
r2[d] = sum([float(data[d][k]) for k in keys])/len(keys)

print(r2)
#{'Unprocessed': 0.9624999999999999, 'Standardisation': 0.9542499999999999, 'Normalisation': 0.9584999999999999, 'Rescale': 0.9542499999999998}

最新更新