从r中的数据框架列表中使用tidymodels引导



我正在使用tidymodels运行一个模型,其中按组拆分数据并在每个单独的数据框架上运行回归。这很有效。然而,现在我还需要引导我的结果。我不确定如何将其构建到我现有的代码中。

我的原始代码看起来像这样:

library(dplyr)
year <- rep(2014:2018, length.out=10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace=T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- data.frame(year=year, group=group, value=value, female=female, smoker=smoker)
# cut the dataset into list
table_list <- dta %>%
group_by(year, group) %>%
group_split()
# fit model per subgroup
model_list <- lapply(table_list, function(x) glm(smoker ~ female, data=x,
family=binomial(link="probit")))
# predict
pred_list <- lapply(model_list, function(x) predict.glm(x, type = "response"))

我想用替换来引导以获得引导预测值。我的直觉是,当我创建table_list时,我应该通过创建随机样本来进一步拆分数据集。但我该怎么做呢?

谢谢你的帮助。

这是相当复杂的,与分组和引导,所以我可能会这样做,使用map()两层深度:

library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
year <- rep(2014:2018, length.out=10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace=T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- tibble(year=year, group=group, value=value, female=female, smoker=smoker)

glm_boot_mods <- 
dta %>%
nest(data = c(-year, -group)) %>%
mutate(boots = map(
data,  
~ bootstraps(., times = 20) %>%
mutate(model = map(.$splits, ~ glm(smoker ~ female, data = analysis(.x),
family = binomial(link = "probit"))),
preds = map2(model, .$splits, ~predict(.x, newdata = assessment(.y))))
))

glm_boot_mods
#> # A tibble: 35 × 4
#>     year group data               boots                
#>    <int> <dbl> <list>             <list>               
#>  1  2014     1 <tibble [288 × 3]> <bootstraps [20 × 4]>
#>  2  2015     4 <tibble [273 × 3]> <bootstraps [20 × 4]>
#>  3  2016     3 <tibble [301 × 3]> <bootstraps [20 × 4]>
#>  4  2017     2 <tibble [282 × 3]> <bootstraps [20 × 4]>
#>  5  2018     0 <tibble [276 × 3]> <bootstraps [20 × 4]>
#>  6  2014     3 <tibble [279 × 3]> <bootstraps [20 × 4]>
#>  7  2016     2 <tibble [314 × 3]> <bootstraps [20 × 4]>
#>  8  2018     1 <tibble [296 × 3]> <bootstraps [20 × 4]>
#>  9  2014     0 <tibble [304 × 3]> <bootstraps [20 × 4]>
#> 10  2015     6 <tibble [288 × 3]> <bootstraps [20 × 4]>
#> # … with 25 more rows

第一个map()为每个分组创建bootstrap样本,然后我们再深入一层,对于每个样本拟合一个模型,并预测该样本的helout观测值。你可以看到第一个组里面是什么样子的

glm_boot_mods %>%
head(1) %>% 
pull(boots)
#> [[1]]
#> # Bootstrap sampling 
#> # A tibble: 20 × 4
#>    splits            id          model  preds      
#>    <list>            <chr>       <list> <list>     
#>  1 <split [288/111]> Bootstrap01 <glm>  <dbl [111]>
#>  2 <split [288/93]>  Bootstrap02 <glm>  <dbl [93]> 
#>  3 <split [288/103]> Bootstrap03 <glm>  <dbl [103]>
#>  4 <split [288/106]> Bootstrap04 <glm>  <dbl [106]>
#>  5 <split [288/109]> Bootstrap05 <glm>  <dbl [109]>
#>  6 <split [288/109]> Bootstrap06 <glm>  <dbl [109]>
#>  7 <split [288/92]>  Bootstrap07 <glm>  <dbl [92]> 
#>  8 <split [288/111]> Bootstrap08 <glm>  <dbl [111]>
#>  9 <split [288/99]>  Bootstrap09 <glm>  <dbl [99]> 
#> 10 <split [288/111]> Bootstrap10 <glm>  <dbl [111]>
#> 11 <split [288/102]> Bootstrap11 <glm>  <dbl [102]>
#> 12 <split [288/104]> Bootstrap12 <glm>  <dbl [104]>
#> 13 <split [288/115]> Bootstrap13 <glm>  <dbl [115]>
#> 14 <split [288/111]> Bootstrap14 <glm>  <dbl [111]>
#> 15 <split [288/108]> Bootstrap15 <glm>  <dbl [108]>
#> 16 <split [288/110]> Bootstrap16 <glm>  <dbl [110]>
#> 17 <split [288/110]> Bootstrap17 <glm>  <dbl [110]>
#> 18 <split [288/111]> Bootstrap18 <glm>  <dbl [111]>
#> 19 <split [288/103]> Bootstrap19 <glm>  <dbl [103]>
#> 20 <split [288/109]> Bootstrap20 <glm>  <dbl [109]>

由reprex包(v2.0.1)创建于20121-11-02

请注意,每个样本都有对空点观测值的预测。根据您想要做的事情,您可以在接下来需要处理的glm_boot_mods的列上使用unnest()

相关内容

  • 没有找到相关文章

最新更新