当使用expand.grid和purrr::pmap时,R ranger confusion.matrix比预期的要大



很抱歉今天出现了所有与purrr相关的问题,仍在努力找出如何有效利用它。

因此,在So的一些帮助下,我设法使随机护林员模型基于来自data.frame的输入值运行。这是使用purrr::pmap完成的。然而,我不明白返回值是如何从被调用的函数中生成的。考虑这个例子:

library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris)  # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species")
> hyper_grid
Input_table mtry Classification  Target
1       iris1    1           TRUE Species
2       iris2    1           TRUE Species
3       iris1    2           TRUE Species
4       iris2    2           TRUE Species
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Classification, mtry,...) {
RF_train <- ranger(
dependent.variable.name = Target, 
mtry = mtry,
data = Input_list[[Input_table]],  # referring to the named object in the list
classification = Classification)  # otherwise regression is performed
RF_train$confusion.matrix
}
# the pmap call using a row of hyper_grid and the function in parallel
purrr::pmap(hyper_grid, fit_and_extract_metrics)

它应该返回4倍的3*3混淆矩阵,因为iris$Species中有3个级别,但它返回的是巨大的混淆矩阵。有人能向我解释一下发生了什么事吗?

第一行:

> purrr::pmap(hyper_grid, fit_and_extract_metrics)
[[1]]
predicted
true  4.4 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4
4.3   1   0   0   0 0   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.4   1   1   1   0 0   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.5   1   0   0   0 0   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.6   0   1   1   1 1   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.7   1   0   1   0 0   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.8   0   0   1   3 1   0   0   0   0   0   0   0   0   0 0   0   0   0   0
4.9   0   0   1   2 2   0   0   0   0   0   0   0   0   0 1   0   0   0   0
5     0   0   0   1 9   0   0   0   0   0   0   0   0   0 0   0   0   0   0
5.1   0   0   0   0 0   8   0   0   0   1   0   0   0   0 0   0   0   0   0

这里的问题是因为传递给函数的参数是级别,而不是字符。这使护林员功能失灵。要解决这个问题,您只需要在expand.grid:中设置stringsAsFactors = FALSE

hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species", stringsAsFactors = FALSE)

你会得到:

[[1]]
predicted
true         setosa versicolor virginica
setosa         50          0         0
versicolor      0         46         4
virginica       0          4        46
[[2]]
predicted
true         setosa versicolor virginica
setosa         50          0         0
versicolor      0         46         4
virginica       0          5        45
[[3]]
predicted
true         setosa versicolor virginica
setosa         50          0         0
versicolor      0         47         3
virginica       0          3        47
[[4]]
predicted
true         setosa versicolor virginica
setosa         50          0         0
versicolor      0         47         3
virginica       0          3        47

最新更新