r-有没有办法限制IML包Shapley值中的特征数量

  • 本文关键字:特征 Shapley 有没有 IML r h2o iml
  • 更新时间 :
  • 英文 :


我用H2O包创建了一个automl模型。目前,H2O仅在基于树的模型上计算Shapley值。我已经使用IML包来计算AML模型上的值。然而,由于我有大量的特点,情节太混乱,无法阅读。我正在寻找一种只选择/显示前X个功能的方法。我在IML CRAN PDF中找不到任何东西,在谷歌搜索到的其他文档中也找不到。

#initiate h2o
h2o.init()
h2o.no_progress()
#create automl model (data cleaning and train/test split not shown)
set.seed(1911)
num_models <- 10
aml <- h2o.automl(y = label, x = features,
training_frame = train.hex,
nfolds = 5,
balance_classes = TRUE,
leaderboard_frame = test.hex,
sort_metric = 'AUCPR',
max_models = num_models,
verbosity = 'info',
exclude_algos = "DeepLearning", #exclude for reproducibility
seed = 27)
# 1. create a data frame with just the features
features_eval <- as.data.frame(test) %>% dplyr::select(-target)
# 2. Create a vector with the actual responses
response <- as.numeric(as.vector(test$target))
# 3. Create custom predict function that returns the predicted values as a
#    vector (probability of purchasing in our example)
pred <- function(model, newdata)  {
results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
return(results[[3L]])
}
# example of prediction output
pred(aml, features_eval) %>% head()
#create predictor needed
predictor.aml <- Predictor$new(
model = aml, 
data = features_eval, 
y = response, 
predict.fun = pred,
class = "classification"
)
high <- predict(aml, test.hex) %>% .[,3] %>% as.vector() %>% which.max()
high_prob_ob <- features_eval[high, ]
shapley <- Shapley$new(predictor.aml, x.interest = high_prob_ob, sample.size = 200) 
plot(shapley, sort = TRUE)

如有任何建议/帮助,不胜感激。

谢谢,Brian

我可以提供一个巧妙的解决方案,利用iml使用ggplot2进行绘图的事实。

N <- 10 # number of features to show
# Capture the ggplot2 object
p <- plot(shapley, sort = TRUE)
# Modify it so it shows only top N features
print(p + scale_x_discrete(limits=rev(p$data$feature.value[order(-p$data$phi)][1:N])))

相关内容

  • 没有找到相关文章

最新更新