r语言 - 计算黄金标准数据框架的错误率



我有一个4列的数据框,如下所示。每行表示特定数据集(具有特定参数设置)的分类或回归结果。对于每个数据集,我有另一个具有黄金标准结果的数据框架(Kappa和Accuracy用于分类,r平方和RMSE用于回归)。我想生成一个数据框架,除了现有的列之外,还有两个新列,分别显示这两个指标的错误。

也就是说,对于第一个(样本)数据框中的每一行,我想找到黄金标准数据框中的度量1和样本数据框中的度量1之间的差异。这同样适用于度量标准2。新列可以命名为Error 1和Error 2。将(样本数据框中的)每行数据集与黄金标准数据框中的数据集进行匹配。

样本Dataframe:

Dataset, Metric_1, Metric_2, ML_Type
ccp, 11.8076142844202, 0.628949889120101, regression
pageblocks, 0.968940316686967, 0.84426843805383, classification
onp, 0.65282098713529, 0.305364681866831, classification
pageblocks, 0.961023142509135, 0.795966628677049, classification
concrete, 10.4831489351907, 0.62767229736877, regression
onp, 0.650802993357437, 0.301621021444335, classification
concrete, 10.8875688078687, 0.599691053769861, regression
ccp, 4.60154386445267, 0.927419750011992, regression

黄金标准数据框架:

Dataset, Metric_1, Metric_2, ML_Type
ccp, 4.52997493965786, 0.929612792495658, regression
pageblocks, 0.971376370280146, 0.853898273639253, classification
onp, 0.66476078365425, 0.329343309931143, classification
concrete, 9.98998588557546, 0.598660395228019, regression

如果您只是想获得每种模型类型的错误,则以下操作将起作用:

library(dplyr)
df <- tribble(
~Dataset, ~Metric_1, ~Metric_2, ~ML_Type,
"ccp", 11.8076142844202, 0.628949889120101, "regression", 
"pageblocks", 0.968940316686967, 0.84426843805383, "classification", 
"onp", 0.65282098713529, 0.305364681866831, "classification", 
"pageblocks", 0.961023142509135, 0.795966628677049, "classification", 
"concrete", 10.4831489351907, 0.62767229736877, "regression", 
"onp", 0.650802993357437, 0.301621021444335, "classification", 
"concrete", 10.8875688078687, 0.599691053769861, "regression", 
"ccp", 4.60154386445267, 0.927419750011992, "regression" 
)
gold <- tribble(
~Dataset, ~Metric_1, ~Metric_2, ~ML_Type,
"ccp", 4.52997493965786, 0.929612792495658, "regression", 
"pageblocks", 0.971376370280146, 0.853898273639253, "classification", 
"onp", 0.66476078365425, 0.329343309931143, "classification", 
"concrete", 9.98998588557546, 0.598660395228019, "regression"
)
err <- gold %>%
rename_with(~paste0(., "_gold"), .cols = -Dataset) %>%
right_join(df, by = "Dataset") %>%
mutate(
Metric_1_err = Metric_1 - Metric_1_gold,
Metric_2_err = Metric_2 - Metric_2_gold
)
select(err, -ends_with("gold"))
# A tibble: 8 x 6
Dataset    Metric_1 Metric_2 ML_Type        Metric_1_err Metric_2_err
<chr>         <dbl>    <dbl> <chr>                 <dbl>        <dbl>
1 ccp          11.8      0.629 regression          7.28        -0.301  
2 ccp           4.60     0.927 regression          0.0716      -0.00219
3 pageblocks    0.969    0.844 classification     -0.00244     -0.00963
4 pageblocks    0.961    0.796 classification     -0.0104      -0.0579 
5 onp           0.653    0.305 classification     -0.0119      -0.0240 
6 onp           0.651    0.302 classification     -0.0140      -0.0277 
7 concrete     10.5      0.628 regression          0.493        0.0290 
8 concrete     10.9      0.600 regression          0.898        0.00103

最新更新